Download as pdf or txt
Download as pdf or txt
You are on page 1of 105

G

G4 DNA Gain-of-Function Screens

▶G-Quartet DNA Definition


Gain-of-function screens are mutational screens based
on the over- or misexpression of genes, aimed at
detecting genes that may show no loss-of-function
effect. Ectopic expression of an endogenous transcrip-
GABA tion unit adjacent to the 5′ end of the randomly
integrated P element is dependent on the expression of
the GAL4 gene (Drosophila melanogaster), and thus
Definition may be spatially and temporally controlled.
GABA (gamma aminobutyric acid) acts as an inhibi- ▶Drosophila as a Model Organism for Functional
tory neurotransmitter in the central nervous system. Genomics
GABA-ergic neurotransmission via (inhibitory) inter-
neurons is involved in most brain functions.
▶Addiction, Molecular Biology

GAL4–Expression System
Gag
Definition
Definition The GAL4–expression system is a method for directed
Gag is a retrovirus gene that encodes the retroviral gene expression that can be used to misexpress genes in
internal structural proteins ▶MA, CA, and ▶NC, and specific cell types, or tissues, at different times of
some others. development. This system relies on the generation
▶Retroviruses of transgenic lines that carry “activator” or “effector”
constructs. Activator lines express the yeast transcrip-
tion factor, Gal4, under the control of a desired
promoter, whereas effector lines contain DNA binding
motifs for Gal4–(UAS) linked to the gene of interest.
Gain-of-Function Mutations ▶Drosophila as a Model Organism for Functional
Genomics

Gain of function mutations describes mutations that


result in new (protein) functions in heterozygous
individuals.
▶Huntington’s Disease
▶Loss of Function Mutation
▶Mendelian Forms of Human Hypertension and Gamates
Mechanisms of Disease
▶Mouse Genomics
▶Tumor Suppressor Genes ▶Germ Cells
636 Gamma Rays

Gamma Rays Gap

Definition Definition
Gamma rays are an energetic form of electromagnetic A space introduced into an alignment to compensate for
radiation produced by radioactivity or other nuclear or insertions or deletions in one sequence relative to
subatomic processes such as electron-positron annihi- another.
lation. Gamma rays are often defined to begin at an ▶Protein Databases
energy of 10 keV, although electromagnetic radiation
from around 10 keV to several hundred keV is also
referred to as hard X-rays. Gamma rays are a form of
ionizing radiation; they are more penetrating than either
alpha or beta radiation (neither of which is electro- Gap Junctions
magnetic radiation), but less ionizing.
▶Molecular Imaging
W. H OWARD E VANS
Medical Biochemistry and Immunology, Cardiff
University School of Medicine, Cardiff, Wales, UK
wmbwhe@cf.ac.uk
Gammaretroviruses
Definition
Definition Gap junctions comprise minute regions of the cell’s
Gammaretroviruses refers to the genus of simple plasma (surface) membrane containing arrays of
retroviruses that includes murine and feline leukemia closely packed membrane channels. These channels
viruses. Many species contain oncogenes and cause traversing two aligned membranes separated by a
leukemias and sarcomas. 2–3 nm intercellular gap provide a communication
▶Retroviruses pathway that directly connects cell interiors (Fig. 1).
Communication across gap junctions enables single
cells to co-ordinate, integrate and summate their
metabolic and electrical interactive activities but also
Gammaretroviruses IN (Integrase) results in some loss of independence. Co-operation and
harmonisation of cellular activities in tissues and
organs is vividly illustrated by the ordered contraction
Definition of heart muscle, made possible because the beating of
individual cells is synchronised and summated to the
Gammaretroviruses IN (integrase) is the retrovirus
organ level by interactions facilitated by gap junctions.
virion enzyme, and is a product of the pol gene.
Indeed, each cardiac myocyte can communicate with
▶Retroviruses
about a dozen or more surrounding partner cells. In the
brain, gap junctions enable the synchronisation of the
electrical coupling of neuronal cell networks. In non-
excitable cells, small signalling molecules below 1200
Gamma-Secretase (Complex) daltons are exchanged across gap junctions. All animal
cells communicate directly with each other across gap
junctions, except striated muscle where the cells have
The γ-secretase (complex) is a multimeric complex that fused, spermatozoa and non-nucleated erythrocytes and
is composed of at least four different transmembrane platelets (1).
proteins (presenilin 1 or presenilin 2 (▶PSEN1/
PSEN2); (▶APH–1); nicastrin; PEN–2). The last step Characteristics
of ▶amyloid generation from amyloid precursor protein Gap junctions are built from one of three biochemically
(▶APP) is performed by the γ-secretase complex. The different families of proteins.
γ-secretase is also important in other pathways, as for
example in the cleavage of ErbB4, intra cellular domains ▶Connexins
of Notch, and similar types of proteins. In vertebrates, gap junctions are constructed from
▶Alzheimer’s Disease ▶connexin protein units (Fig. 1). Over 20 different
Gap Junctions 637

Gap Junctions. Figure 1 A gap junction plaque showing diagrammatically the arrangement of paired connexon
hemichannels in the membrane. These dock to generate a channel directly joining two cells. Connexons attach
to the edges of plaques, but they may also function as hemichannels in their own right. In the top box, the
oligomerisation of connexins into hexameric connexons and the various types of gap junctions formed are
illustrated. In the lower box, the topography of a generalised connexin in the lipid bilayer is shown. Bold line
indicates sequences with high homology.

connexins have been found in humans and rodents and Cx38, Cx40, Cx42, Cx43, Cx50, Cx56; Group3 or
various tissues synthesise different and often over- gamma : Cx45, Cx47 and an uncategorised group:
lapping types of connexin (Table 1). They constitute a Cx25, Cx29, Cx30.2, Cx36, Cx39, Cx40.1 and Cx58.
family of proteins displaying about 40% overall amino Cx43 is by far the most widely distributed connexin
acid sequence identity. A widely adopted method of (1). In heart ventricles, myocytes express Cx43
naming individual connexins uses the abbreviation Cx exclusively but myocytes in the atrium and Purkinje
followed by the predicted molecular mass in kD. fibres also express Cx40. Endothelial cells lining the
Increasingly, a prefix may be added to indicate the vascular wall express Cx37 in addition to Cx43, which
species e.g. h, human; m, mouse; zf, zebra fish, etc. is also expressed by the smooth muscle cells control-
When the molecular mass of the connexins is similar, a ling the contraction of arteries. Cx36 is expressed by
decimal point is added e.g. mCx 30.2, mCx 30.3 and neurons in the retina, whereas in the brain astrocytes
mCx 31.1. However, although there is often little size express mainly Cx43 and oligodendrocytes Cx32 and
variation between species, e.g. mCx36 and fish (skate) Cx47. Cells comprising the various layers in the skin
Cx35, there is sometimes greater disparity in the express a range of connexins, especially Cx26, Cx30,
molecular sizes of functionally similar connexins, e.g. Cx31, Cx32, Cx40, Cx43 and Cx45. In liver,
rat Cx46, bovine Cx44 and chicken Cx56 and clearly hepatocytes express Cx26 and Cx32, with their relative
this nomenclature is not entirely satisfactory. Mouse abundance varying between species. Lens fibre cells
and human connexins have been classified phylogen- express Cx46 and Cx50, but epithelial cells enveloping
etically as follows: Group 1 or beta: Cx26, Cx30.3, the lens capsule express Cx43. Up to 196 permutations
Cx31, Cx31.1, Cx32; Group 2 or alpha : Cx33, Cx37, of homologous and heterologous connexin based gap
638 Gap Junctions

junction interactions are possible when cells express wall of the overall channel. The greatest differences in
two ore more different types of connexins (Fig. 1). The amino acid sequences between connexins are found in
relative amounts of each connexin in cells also varies the intracellular loop projecting into the cytoplasm and
during embryonic and tissue development. consisting of 30 amino acids in the smaller connexins,
Connexins span the membrane lipid bilayer four times 50–55 amino acids in Cx33, Cx37, Cx40, Cx43, Cx46,
and in an alpha helical conformation (Fig. 1). The Cx50 and Cx57, 80 amino acids in Cx45 and 100 amino
protein’s amino terminus (a highly conserved sequence of acids in Cx36. The carboxyl terminal tail, often referred
21 amino acids) and carboxyl terminus (a highly variable to as the regulatory domain, varies in length from 16
number of amino acids as described below) and a single amino acids in Cx26 to 75 in Cx32, 156 in Cx43 and 275
intracellular loop face the inside of the cell (Fig. 1). Two in Cx57. Larger connexins are post-translationally
highly conserved amino acid sequences forming a loop modified by phosphorylation of several serine and
(twenty in the first and forty amino acids in the second) threonine residues on the carboxyl tail. The functional
project in a beta sheet conformation from the cell surface consequences of phosphorylation of connexins remain
into the extracellular space; these loops contain three to be defined as well as the key protein kinases and
cysteine residues that are invariable in all connexins phosphatases involved in regulating gap junction
discovered and they are linked to each other by channels. Connexins are not glycosylated, but are
intramolecular disulphide bonds. These extracellular ubiquitinated and acylated.
loop interactions are crucial for they provide a scaffold The genes encoding mouse and human connexins are
that aligns and extends the transmembrane ▶hemichan- located on different chromosomes (Table 1). The
nel domains across the intercellular gap. The amino acid general organisation of connexin genes is similar, and
sequences in these two loops also set the rules governing connexin phylogenetic trees indicate that connexins are
the compatibility of interactions between various con- likely to have arisen by gene duplication. Most
nexins. For example, cells with gap junction channels connexin genes have a first exon containing 5′
made from Cx32 and Cx26 communicate but cells untranslated sequences and a large second exon
making Cx32 do not communicate with those making containing the complete coding region as well as
Cx43. The types of connexin synthesised by cells untranslated sequences. In contrast, the Cx32 gene
thus dictate whether homo-or hetero-philic gap contains two alternative first exons and their expression
junction channels are generated, with different channel is tissue specific. Connexin 36 contains two exons both
pore characteristics and molecular selectivities. The with translated and untranslated sequences with the
third transmembrane domain and parts of the first coding region interrupted by an intron. The Cx45 gene
and second transmembrane domains contribute the has three exons, with most transcripts containing only

Gap Junctions. Table 1 Properties of some major mouse/human connexins

Connexin Chromosome mRNA Cells/tissues Disease involvement


type mouse (kb) where present
Cx26 14 2.4 Mammary gland, liver, skin, Deafness, palmoplantar
cochlea hyperkeradosis
Cx30 14 20.23 Skin, cochlea Deafness, ectodermal dysplasia,
hair loss
Cx31 4 1.9, 2.3 Skin, cochlea, uterus Deafness erythrokeratoderma
variabilis
Cx32 X 1.6 Liver, oligodendrocytes Peripheral neuropathy
Cx36 2 2.9 Neurons, retina Visual defects
Cx37 4 1.7 Endothelium ovaries bleeding
Cx40 3 3.5 Heart, endothelium Atrial arrhythmia
Cx43 10 3.0 Diverse Cardiac malformations
Cx45 11 2.2 Heart, brain thalamus, bladder
Cx46 14 2.8 lens cataract
Cx50 3 8.5 lens cataract
Gap Junctions 639

exons 2 and 3. Cx40 also has three exons with the third recently in Annelida. The biochemical characterisation
containing the complete coding sequence. mRNA of innexins is awaited.
encoding connexins is subject to transcriptional control Pannexins are electrical junctions in vertebrates that are
and tissue specific promoters (acting via multiple related to innexins. Pannexin 1 and 2 genes are
exons) account for hormonal and pharmacological abundantly expressed in the central nervous system,
regulation of expression. This is important, for especially the hippocampus, olfactory bulb and cere-
example, in the contraction of uterine muscle during bellum; pannexin 1 is confined to white matter. As with
birth when gap junction numbers composed of Cx43 innexins, expression of pannexin transcripts generates
are regulated by oestrogens whereas gap junctions in intercellular channels with similar electrical character-
the heart are not. Post-transcriptional regulation of istics to vertebrate gap junctions constructed of
mRNA levels is especially evident with some con- connexins. Their biochemical characterisation is
nexins, for example Cx32 and Cx26 in liver. awaited.
How are connexins organised into gap junctions? A
hexagonal arrangement in which six connexins interact Assembly and Breakdown of Gap Junctions
and surround a central pore of 2 nm was suggested by Connexins are rapidly degraded; the half-life in heart
the regular packing of the presumed channel units in and other cells is around two to four hours, a figure G
liver gap junctions stained with heavy metals as well as about 10–20× faster than most membrane proteins.
by atomic force microscopy. The most up to date Connexins are co-translationally inserted directly from
structure is based on a three-dimensional electron ribosomes into the endoplasmic reticulum, threading
crystallographic analysis of gap junctions prepared four times into the membrane bilayer to achieve the
from cultured cells over-expressing recombinant Cx43. typical connexin topography (Fig. 1). Connexins show
This model (2) shows the alignment of two composite a proclivity to oligomerise into hexameric hemichan-
hexameric connexin hemichannels at a resolution of nels that exit the endoplasmic reticulum and enter the
0.7 nm in the membrane plane and 2.1 nm in the Golgi apparatus. The hemichannels are maintained in a
vertical plane and it confirms independent biochemical closed configuration to limit continuity between
and physical chemical studies showing the presence of cytoplasmic and lumenal environments since ionic
24 transmembrane alpha helices in each connexon gradients allow cell-signalling responses. The hemi-
hemichannel unit. The hemichannel unit has a dumb- channels are then trafficked in membrane vesicles to
bell shape and is 7 nm wide at the cytoplasmic aspect, the cell’s plasma membrane and attach to the periphery
narrowing to 5 nm at the extracellular aspect. The of pre-existing gap junction plaques, a process that
aqueous channel narrows from 4 nm diameter at the occurs simultaneously with their docking and align-
cytoplasmic entry point to 1.5–2.5 nm depending on ment with hemichannels in the adjacent cell (Fig. 1).
the calcium concentration at the extracellular region Connexins can be tagged at the carboxyl terminus with
where it becomes continuous with the partner hemi- auto-fluorescent proteins such as green fluorescent
channel of a neighbouring cell. The arrangement of the protein or short tetracysteine-containing amino acid
connexin subunits in the gap junction ensures that motifs to which arsenic-containing chemicals that
the intercellular channel is completely insulated from fluoresce at different wavelengths will bind. These
the extracellular ‘gap’. approaches have elegantly demonstrated in living cells
how connexin hemichannels are moved to the plasma
▶Innexins and ▶Pannexins membrane and accrete into gap junction plaques and
Arthropod and vertebrate gap junctions differ in their how gap junction units are internalised (3). Connexins
overall thickness. Invertebrate gap junctions are are transported on 0.5 μm vesicles to the plasma
constructed of a biochemically unrelated class of membrane guided by a microtubular scaffolding and
proteins called innexins. There is no amino acid are removed from the central area of gap junction
sequence homology with connexins, but innexins plaques as larger vesicles that correspond to annular
adopt a similar topography in the membrane with four gap junctions observed in the electron microscope. Gap
transmembrane domains and cytoplasmically oriented junction plaques are built from up to thousands of
amino and carboxyl termini. Innexin transcripts paired hemichannel units that cannot be peeled apart
expressed in Xenopus oocytes form functional gap once formed. Therefore, plaques are internalised into
junctions displaying typical channel gating character- partner cells as complete units and are ultimately
istics. The Caenorhabditis elegans genome contains as degraded in phagosomes or lysosomes. In contrast,
many as 25 innexin genes, with single cells in this incorrectly folded or mutated connexins are transferred
worm expressing more than one innexin transcript. directly from the endoplasmic reticulum for degrada-
Innexins are also present in Drosophila fruit flies tion in proteasomes.
where they feature in the development of the nervous Gap junctions are frequently located next to other
system and intestine. Innexins have also been identified adhesive junctions and it is not surprising that their
640 Gap Junctions

assembly by cells is closely co-ordinated. Connexins previously thought of as biogenetic precursors of gap
interact via their carboxyl tails with zona occludens 1 junctions. However, evidence is building up that they
and occludin, two proteins associated with tight are regulatable entities in their own right, with
▶intercellular junctions, with tubulin, the major functions influenced for example by calcium levels
constituent of microtubules and with catenins, a further inside and outside the cell. These unpaired connexin
class of proteins associated with adhesive junctions. hemichannels were first observed in the horizontal cells
Specific connexins are also detected in ▶lipid raft of catfish retina and their operation has now been
membrane microdomains where they associate with studied extensively (5). Their presence in mammalian
caveolins. cells was demonstrated on the basis of passage of small
dyes across the channels or the detection of electrical
Regulatory Mechanisms currents crossing them. Cells tolerate small numbers
Gap Junction Channels of open connexin hemichannels on the plasma
Measurements of electrical currents across gap junc- membrane but the presence of larger numbers is often
tions in paired Xenopus oocytes synthesising various a pathological consequence of a metabolic insult such
recombinant connexins have shown that opening and as in ischaemia, when cells release ATP across the open
closing (gating) of gap junction channels is determined hemichannels in cardiac myocytes or glutamate in
mainly by a voltage difference between the paired cells astrocytes. Importantly, hemichannels provide a second
and features amino acid sequences in the first to second mechanism for connexin-dependent intercellular pro-
transmembrane regions. In larger connexins, a second pagation of calcium waves that complements the
mechanism is identified involving a “ball and chain” calcium signalling occurring directly across gap
type interaction between amino acids in the intracel- junctions.
lular loop and the carboxyl tail. This chemical gating is
mainly a function of pH with channels closing as pH Modifications in Disease
drops (4). Gap junctions constructed of different Changes in gap junctional communication and espe-
connexins vary in their gating characteristics. Calcium cially mutations in connexin genes have been shown to
ions also regulate gap junctional communication. correlate with a number of human diseases. Mutations
Elevation of calcium in cells generally closes gap in Cx43 are extremely rare and diseases in tissues
junctions. Cell signalling responses such as an increase expressing this connexin mainly involve modifica-
in cytoplasmic calcium induced by mechanical aggra- tions in the abundance of functional gap junction
vation of cells or release of inositol phosphates are channels and the remodelling of pre-existing gap
propagated across gap junctions to neighbouring cells junctions as seen in cardiac hypertrophy and infarction
thus generating a calcium wave. The biochemical and in endothelial dysfunction in arteries. In contrast,
nature of the signal transmitted is not known for sure about 200 Cx32 mutations have been detected in the
but one suggestion is that intercellular transmission of X-linked form of ▶Charcot-Marie-Tooth Disease, a
calcium waves involves the passage of inositol tripho- demyelinating syndrome that leads to degeneration of
sphate through the gap junction channel. Calcium peripheral nerves. The channels constructed of Cx32
waves are also propagated between cells across are located mainly in the paranodal loops and Schmidt-
connexin hemichannels. Here, ATP is released across Landermann incisures of myelinating Schwann cells
the hemichannel and then binds in a paracrine fashion and provide a direct radial diffusion pathway that is
to purinergic receptors on neighbouring cells. In about 300-fold shorter in distance than a circumfer-
contrast, progress in establishing the biochemical ential route. Many missense, frameshift, deletion and
nature of molecules/ions transmitted across gap junc- nonsense mutations lead to a loss of function, many
tions has been slow, probably because its direct nature caused by failure of Cx32 to oligomerise correctly into
has made their interception difficult. All that can be hemichannels and to be targeted in a precise manner
said is such entities are likely to be 1.2 Kd or less and to the plasma membrane. Sometimes, gap junctions
with an ionic radius of below 1 nm. Gap junctions are formed by these mutated connexins but their
should not be regarded as nonselective pores joining operation is faulty. Gap junctions constructed form
cells, for the molecular selectivity for a range of Cx32 in the liver and pancreatic acinar cells (Table 1)
permeants in the context of charge and size is a function are unaffected. Surprisingly, myelination by oligoden-
of the connexin makeup of the channel. For example, drocytes in the central nervous system is unaffected.
gap junction channels distinguish between the passage Mutations in Cx26 and Cx30 in the human inner ear are
of cAMP and cGMP. associated with congenital deafness, a disorder present
in 1 in 1,000 births. Over 50 Cx26 mutations are
Hemichannels known, with one common recessive mutation resulting
Unopposed connexin hemichannels in non-junctional in a severely truncated Cx26 protein. Many other site-
regions of the plasma membrane (Fig. 1) were specific mutations result in trafficking/▶channel
Gatekeeper Genes 641

assembly deficiencies. Gap junctions in the inner ear


function in K+ circulation from the interstitial space and GAR Motif
through the cochlear supporting cells, a mechanism
confirmed in mice in which the Cx26 gene was
inactivated. Mutations in Cx26, Cx30 and Cx31.1 are Definition
also associated with disorders of the skin. A GAR motif is a glycine and arginine-rich region of a
Mutations in Cx50 and Cx46 genes are associated with protein in which the arginine residues are often
lens transparency and are prevalent in humans with methylated.
inherited zonular pulverulent cataract. As with Cx43 in ▶Methylation of Proteins
the heart, gap junctional communication is important in
cell migration during organ growth and development.
Knowledge gained by studying genetic mutations in the
laboratory by functional expression in model cell
systems has been extended and complemented in
transgenic mice with single or double gene connexin Gas Chromatography (GC)
“knockouts” and “knockins”. These approaches have G
added important inputs to our understanding of the
physiological, pathological and developmental roles of Definition
individual connexins. The knowledge base will expand Gas chromatography is an analytical method by which
with the application of RNA interference techniques to gaseous compounds are separated based on their
silence connexin genes simultaneously in vivo. Further volatility (boiling point) and interaction with a static
research aims will be to identify functions that can be liquid phase.
attributed to individual connexins or specific combina- ▶Mass Spectrometry: Quantitation
tions of connexins assembled into heteromeric chan-
nels. Higher definition molecular models will help
explain how gap junction channels operate. The
molecular basis of the specificity of transfer of signals
between cells across gap junctions and the way they are
interpreted will continue to be major foci for research. Gastrulation
▶Neuron

Definition
References Gastrulation is a morphogenetic process in embryonic
1. Evans WH, Martin PE (2002) Gap Junctions; structure development, during which the three germ layers
and function. Mol Membr Biol 19:121–36 ectoderm, endoderm and mesoderm form. Mesoderm is
2. Unger VM, Kumar NM, Gilula NB et al (1999) Three formed from ectoderm by ▶epithelial-to-mesenchymal
dimensional structure of a recombinant gap junction conversion. The primitive ectoderm divides into surface
membrane channel. Science 238:1176–1180
ectoderm and neuroectoderm.
3. Gaietta G, Deerinck TJ, Adams SR et al (2002) Multicolor
electron microscopy imaging of connexin trafficking. ▶Neural Development
Science 29:503–507
4. Harris AL (2001) Emerging issues of connexin channels:
biophysics fills the gap. Q Rev Biophys 34:352–472
5. Paul DL, Goodenough DA (2003) Beyond the gap:
functions of unpaired connexon channels. Nat Cell Biol
4:285–294
6. White TW, Paul DL (1999) Genetic diseases and gene
Gatekeeper Genes
knockouts reveal diverse connexin function. Annu Rev
Physiol 61:283–310
Definition
Gatekeeper genes are involved in the control of cell
cycle progression, lifespan of a cell or cell death. They
are often a target for mutations during cancer develop-
ment. The ▶APC gene is the major gatekeeper of the
GAPs colon.
▶Cell Cycle - Overview
▶DNA-Repair Mechanisms
▶GTPase-Activating Proteins ▶Tumor Suppressor Genes
642 Gating

Gating Gelsolin

Definition Definition
Gating refers to the conformational changes of ion Gelsolin and gelsolin-like proteins are ubiquitous F-
channels during the opening and closing of the actin fragmenting proteins. They perform three major
permeation pathway. functions: (1) They sever actin filaments to smaller
▶Ion Channels/Excitable Membranes fragments; (2) They cap free barbed ends thus
inhibiting elongation; and (3) They nucleate actin
polymerisation by stabilising dimers and trimers. Many
of these proteins are regulated by Ca2+ and the
phosphoinositide PIP2.
▶Actin Cytoskeleton

GBP/FRAT

Definition Gemins
GBP (GSK–3 binding protein), and its mammalian
homologue FRAT, bind to the serine-threonine kinase
glycogen synthase kinase 3 (GSK3), and inhibit its Definition
phosphorylation of non-primed GSK–3 substrates. It Gemins are proteins that associate with the ▶SMN
has tumor promoting activity in lymphocyte. (survival of motoneuron) complex in a stable and
▶Wnt/Beta-Catenin Signaling Pathway stoichiometric manner. As these proteins and SMN
colocalize in nuclear structures called “gems”, they are
referred to as “Gemins”.
▶Spinal Muscular Atrophy

GDIs
GenBank
▶Guanine Nucleotide Dissociation Inhibitors
Definition
GenBank, the NIH genetic sequence database, is an
annotated collection of all publicly available DNA
sequences, located at http://www.ncbi.nlm.nig.gov. It is
part of the International Nucleotide Sequence Database
Collaboration, which is comprised of the DNA
GEF DataBank of Japan (DDBJ), the European Molecular
Biology Laboratory (EMBL), and GenBank at NCBI.
▶Protein Databases
▶Guanine Nucleotide Exchange Factor

Gene

2D-Gel Electrophoresis Definition


Gene refers to a segment of DNA coding for a
polypeptide chain of amino acids. A gene includes
▶Two-Dimensional Gel Electrophoresis untranslated regions before and after the coding region
Gene Annotation in Plants 643

and intervening (intronic) sequences. Nomenclature: availability of sequence data, whether related genomes,
Human genes are abbreviated using italic capital letters complete cDNAs or expressed sequence tags (ESTs).
and numbers without spaces, dashes etc. according to
the approved gene symbols listed in the Human Gene Definition
Nomenclature Database (http://www.gene.ucl.ac.uk/ All prokaryotic and eukaryotic organisms encode their
cgi-bin/nomenclature/searchgenes.pl). Nomenclature genetic information in their genome, built up from
for murine gene symbols underlies the same rules, DNA. The process of genome annotation consists of
but uses lower case letters after the first letter. Example: finding and decoding the information encrypted on the
PSEN1 = human gene for PS1; Psen1 = murine gene DNA molecules into known conceptual objects related
for PS1. to biological entities and functions. In general,
annotation is mostly focused on finding genes, defining
their structure and assigning a function to the product
and the process resulting from the expression of each
gene. But annotation is not restricted to genes, as non-
coding RNAs, transposable elements, promoters and
Gene Annotation in Plants
enhancers also make up the genome and are essential to G
an understanding of the organization and the various
functions encoded by or embedded in genomes.
S. R OMBAUTS , Y. VAN DE P EER , P IERRE R OUZÉ Characteristics
Department of Plant Systems Biology, University of Gene annotation in plants is in essence not different
Ghent, Ghent, Belgium from gene annotation in human or mouse, except that
pierre.rouze@psb.ugent.be each genome although constituted by the same DNA
has its own style that needs to be captured by the
models used by the different prediction programs.
Synonyms Anticipating which are the genome specific character-
Just as for gene and genome annotation in any other istics, e.g. codon usage, gene density, length and
organism, gene annotation in plants makes use of composition of introns and intergenic sequences, as
algorithmic approaches based on statistics, artificial well as conservation of signals such as splice sites
intelligence and machine learning, recently complemen- (Table 1) are all important for building adequate
ted with homology based approaches depending on the algorithms and for their proper training.

Gene Annotation in Plants. Table 1 Splice site prediction programs

Program Organism Method


GeneSplicer (152) Arabidopsis, human HMM + MDD
NETPLANTGENE (42) (▶http://www.cbs. Arabidopsis NN
dtu.dk/services/NetPGene/)
NETGENE2 (43) (▶http://www.cbs.dtu.dk/ Human, C.elegans, NN + HMM
services/NetGene2/) Arabidopsis

SPLICEVIEW (39) (▶http://l25.itba.mi.cnr. Eukaryotes Score with consensus


it/webgene/wwwspliceview.html)
NNSPLICE0.9 (44) (▶http://www.fruitfly. Drosophila, human or other NN
org/seq_tools/splice.html)
SPLICEPREDICTOR (40,153) (▶http:// Arabidopsis, maize Logitlinear models: (i) score with
bioinformatics.iastate.edu/cgi-bin/sp.cgi) consensus; (ii) local composition

BCM-SPL (▶http://www.softberry.com/ Human, Drosophila, Linear discriminant analysis


berry.phtml; http://genomic.sanger.ac.uk/ C.elegans, yeast, plant
gf/gf.html)

HMM, hidden MM; MDD, maximal dependence decomposition; NN, neural networks
644 Gene Annotation in Plants

The effort undertaken by several laboratories world- other hand genomes are too distantly related, only a few
wide to sequence the Arabidopsis thaliana genome genes will keep significant similarity to be correctly
led, by the end of 2000, to the first full catalogue of modeled or even be found. For plants, genome
genes present in a plant. This work also revealed that sequences of one dicot, Arabidopsis, and one monocot,
plants have about the same number of protein gene loci rice are presently available. Unfortunately they
as vertebrate genomes (27,000 for Arabidopsis) but diverged some 200 million years ago and are therefore
also that about 5,000 genes remained unknown, too divergent for comparative annotation. Other
showing no homology to any known sequence in genomes that are currently being sequenced, poplar,
databases (1). These genes could only be predicted Medicago and Lotus for dicots, maize for monocots
using ab initio prediction programs and need further will soon fill this gap.
experiment to prove that they truly exist, which has Besides experimental approaches, functional annotation
since been done for a number of them. can only be achieved through comparative methods
Genome annotation can be subdivided in two steps; where the knowledge of genes from one organism can
structural annotation will provide gene structures be transposed to the genes and the genome of the
and functional annotation as an as accurate as organism concerned. To achieve this, homology
possible prediction of the function of each gene. To searches and alignment programs are the main algo-
acquire structural annotation two main approaches rithms used (Table 2). The quality of the databases
are combined, intrinsic and extrinsic methods. Intrinsic providing the data is of primary importance and is
(or ab initio) methods are all those methods that decipher currently the main limitation. The homology searches
genome content based solely on statistical/lexical are performed against protein, domain and motif
models built by using human-curated data sets of databases using BLAST or FastA or by using hidden
sequences from the organism under investigation. The Markov profiles that show a better sensitivity and
algorithms used for intrinsic approaches can be specificity. One point that needs to be stressed is the
subdivided into signal sensors or content sensors. Signal importance of consistency in genome annotation across
sensors are algorithms that focus on the retrieval and species. The gene ontology (GO) project is an attempt to
identification of functional sites such as e.g. splice achieve this goal, through a hierarchical description of
and translation initiation sites, transcription and poly- the genes in a genome according to the functions of their
adenylation signals and include methods such as products at the various biological levels, from the
position weight matrices, neural networks and support molecules to the biological processes and cellular
vector machines. Content sensors recognize regions components with which they are associated. The
along a sequence that have local characteristics differing classification and standardized terminology of GO
from the surrounding sequence, include mainly methods (gene ontology) has been initiated in the annotation of
such as Markov models and all their variants and are the D. melanogaster genome and it is hoped that GO
broadly used to distinguish coding from non-coding will become a community-curated entity, providing a
regions. central frame and vocabulary for annotation.
The purpose of the multiplicity of methods used is to Many gene prediction programs are publicly available;
decompose the problem of gene annotation into key many of them are referred to on the web site maintained
components and achieve the best results on each by W. Li (▶http://linkage.rockefeller.edu/wli/gene/
component, which can thereafter be reassembled programs.html). Several reviews are available on this
towards a whole gene structure (2). topic too, among which that of Mathé et al. (2) is more
Extrinsic or comparative approaches on the contrary specifically oriented towards plants. The large variety
rely on the availability of sequences from other of gene prediction programs have the drawback that
genomes and proteins, where regions that show enough one does not necessarily know which program to use in
similarity between genomes are believed to have the which situation and which performs best depending on
same biological meaning, as a result of common the organism of interest. The issues of specificity, or the
ancestry. The advantage of comparative methods to ability to predict only real genes and sensitivity, or the
predict genes is that it allows the revealing of small and ability to predict all the genes present in a sequence,
novel genes without ambiguities and, more impor- have been addressed by Burset and Guigo (3), Rogic
tantly, enables the detection of non-coding features that et al. (4) and in the specific case of plants by Pavy
can hardly be detected otherwise. One needs never- et al. (5). In those publications it has been made clear
theless to be careful as to the choice of organisms used that programs rarely performed well enough to be able
for comparative methods to achieve good results. Too to predict all the genes and that 30–40% of the genes
closely related genomes might not reveal the informa- were likely to be wrongly predicted even by the best
tion hidden in the genomes, as regions larger than intrinsic methods. In addition, it is clear that the more
coding sequences will remain conserved, making the extrinsic data (of high quality) become available, the
delineation of gene structures impossible. If on the better the annotation will be.
Gene Annotation in Plants 645

Gene Annotation in Plants. Table 2 Homology-based gene prediction programs

Program Organism Databank or Alignment Gene


required input reconstr-
uction
AAT (66) (▶http://genome. Primates, cDNA, protein DDS (improved NAP, GAP2
cs.mtu.edu/aat. rodents, other BLASTX), DPS
html) (improved BLASTN)
ALN (62) Protein Tron code, PAM 250
CEM Two genomic BLASTX output, DP
(81) sequence WMM for sites
EbEST (▶http://ares.ifrc. Human, other dbEST BLASTN, 3'-UTR detection,
(73) mcw.edu/EBEST/ EST clustering, Smith assembly of
ebest.html) ± Waterman-based EST-tagged
gapped alignment exons
G
Est2genome EST or cDNA, ModiÆed Smith No
(74) preferably ± Waterman
BLASTN output Needleman-Wunsh
algorithm
GeneSeqer (▶http:// Arabidopsis, dbEST or EST Spliced alignment, Yes
(67,68) bioinformatics. maize, generic database or splice recognition
iastate.edu/ plant proteins with SplicePredictor
cgi-bin/gs.cgi) if missing EST
match
GeneWise (▶http://www. Human One Global alignment DP (dynamite)
(60) sanger.ac.uk/ protein or a HMM translated ORF/
Software/Wise2/ profile protein
genewiseform.
shtml)
GENQUEST (▶http://compbio. dbEST, Smith-Waterman,
ornl.gov/Grail-bin/ SwissProt, Blast, Fasta
EmptyGenquest Prosite,
Form) BLOCKS, GSDB
ICE (64) (▶http://theory. dbEST, OWL Look-up DP
lcs.mit.edu/ice)
INFO (63) Nr 25mer look-up table, No
protein/protein
alignments scored
with PAM 40, PAM
120, PAM 250,
BLO62
ORFgene2 (▶http://l25.itba. Human, mouse, SwissProt BlastP, WAM for Compatibility
(61) mi.cnr.it/ Drosophila, splice sites, graph, DP
webgene/ Aspergillus, identity score on
wwworfgene2. Arabidopsis, frequencies of
html) Caenorhabditis dipeptides
PredictGenes (▶http://cbrg.inf. Invertebrates, SwissProt PAM 250 DP
ethz.ch/Server/ vertebrates,
subsec- prokaryotes,
tion3_1_8.html) plants
646 Gene Annotation in Plants

Gene Annotation in Plants. Table 2 Homology-based gene prediction programs (Continued)

Program Organism Databank or Alignment Gene


required input reconstr-
uction
PRO- (▶http://www. Vertebrates One homologous Protein/protein DP
CRUSTES hto.usc.edu/ protein alignments scored
(59) software/ with PAM 120
procrustes/
wwwserv.html)
Pro-Gen (83) (▶http://www. Two genomic DP
anchorgen.com/ sequences
pro_gen/ Alignment of
pro_gen.html) translated
sequences scored
with PAM 120
ROSETTA (▶http:// Human, mouse Two genomic GLASS (global DP
(80) crossspecies. sequences alignment system),
lcs.mit.edu/) PAM 20, Genscan
method for splice
sites
SGP-1 (82) (▶http://soft.ice. Vertebrates, Two genomic Local alignment DP
mpg.de/sgp-1) angiosperms sequences or a
pairwise local
alignment output
SIM4 (69) All eukaryotes cDNA/genomic HSP from Blast No
SLAM (85) (▶http://baboon. Human, mouse Two genomic Generalized pair DP
math.berkeley. sequences HMM
edu/syntenic/
slam.html)
Spidey (70) (▶http://www. Vertebrates, One genomic Two Blasts: high
ncbi.nlm.nih.gov/ Drosophila, sequence/set of stringency and low
IEB/Research/ C. elegans, mRNAs stringency
Ostell/Spidey/ plant
index.html)
SYNCOD (72) (▶http://l25.itba. Human, mouse, BLASTN output Silent/replacement No
mi.cnr.it/ Drosophila, ratio,Monte Carlo
webgene/ Arabidopsis, simulations
wwwsyncod. Aspergillus,
html) Caenorhabditis
TAP (75) (▶http://sapiens. Human, mouse, dbEST WU-BLASTN, SIM4 Yes
wustl.edu/zkan/ Drosophila
TAP/)
Utopia (84) All eukaryotes Two genomic Local alignment Yes
sequences

The above described programs are different from Clinical Relevance


annotation platforms, which do not attempt to make The use of plants for medical purposes dates back
predictions themselves, but present the results from thousands of years and was part of the magic exerted by
different prediction programs graphically and thus have medicine man, sorcerers and druids. Even now many
to be seen as complementary tools that facilitate human pharmaceutical compounds that are commonly used are
driven annotation. or were once extracted from plants. Genetic engineering
Gene Annotation in Plants 647

Gene Annotation in Plants. Table 3 Ab initio gene prediction programs (possibly with homology integration)

Program Organism Gene elements Gene model Homology


DAGGER (91) Site scores Directed acyclic
graphs
EuGène (31) (▶http://www.inra. Arabidopsis Three-periodic DP EST/cDNA,
fr/bia/T/EuGene) IMM for exons, one protein
IMM for introns,
one for intergenic
regions, one for
UTR. NetGene2/
SplicePredictor for
splice sites
GeneId3 (89) (▶http://www1. Vertebrates, plants Rule-based DP EST
imim.es/geneid. method; WAM,
html) discriminant G
analysis.
GENEFINDER (▶http://genomic. Human, mouse, Linear DP Protein
(28): FGENE, sanger.ac.uk/gf/gf. Drosophila, discriminant
FEX html;;▶http://www. Caenorhabditis analysis
softberry.com/ elegans, yeast,
berry.phtml) dicots, monocots,
Schizosaccharomy-
ces pombe,
Neurospora crassa
GENEFINDER Log likelihood ratio DP
(Green) score matrix on
MM
GeneGenerator Maize Logitlinear models DP
(92) for splice sites,
start; 3rd to 5th
order MM for
exons and introns
GeneMark (29) (▶http://opal. Prokaryotes, 5th order MM No
biology.gatech. eukaryotes (homogeneous for
edu/GeneMark/ introns, three-
genemark24.cgi) periodic for exons)
GeneMark. (▶http://opal. Human, mouse, 5th order MM GHMM, DP Under
hmm (35) biology.gatech. Drosophila, Gallus (homogeneous for development
edu/GeneMark/ gallus, Arabidopsis, introns, three-
eukhmm.cgi) rice, maize, periodic for exons)
Chlamydomonas
reinhardtii,
C. elegans,
Hordeum vulgare,
Triticum aestivum
GeneModeler (▶ftp://ftp.tigr.org/ Eukaryotes Nucleotide and Rule-based
(57) pub/software/gm/) dinucleotide method
composition,
consensus for
splice sites
GeneParser (▶http://beagle. Vertebrates NN DP EST
(27) colorado.edu/
eesnyder/
GeneParser.html)
648 Gene Annotation in Plants

Gene Annotation in Plants. Table 3 Ab initio gene prediction programs (possibly with homology integration)
(Continued)
Program Organism Gene elements Gene model Homology
Genie (44,96) (▶http://www. Drosophila, human, NN GHMM, DP Protein
fruitøy.org/ other
seq_tools/genie.
html)
GenLang (154) (▶http://www.cbil. Vertebrates, Chart parsing, DP
upenn.edu/ Drosophila, dicots
genlang/ Grammar rules,
genlang_home. WAM, hextuple
html) frequencies
GenomeScan Vertebrates Genscan method, GHMM, DP Protein
BLASTP or
BLASTX
GENSCAN (30) (▶http://genes.mit. Vertebrates, WAM for acceptor; GHMM, DP Protein,
edu/GENSCAN. Arabidopsis, maize MDD for donor; 5th GenomeScan
html) order MM (99)
(homogeneous for
introns, three-
periodic for exons)
GENVIEW2 (▶http://l25.itba. Human, mouse, Linear DP
(25) mi.cnr.it/ Diptera combination,
webgene/ dicodon statistic
wwwgene.html)
GlimmerM (▶salzberg@cs. Small eukaryotes, Three-periodic DP
(33,34) jhu.edu) Arabidopsis, rice IMM for exons
(order 0-8), IMM for
introns, 2nd order
MM for splice sites
GRAIL/GAP3/ (▶http://compbio. Human, mouse, NN DP EST, cDNA
GrailEXP ornl.gov/public/ Arabidopsis,
(90,155) tools/) Drosophila
GRPL (97) Human, Drosophila, Reference point GHMM, DP Protein
Arabidopsis logistic for splice
sites, 5th order MM
(homogeneous for
introns, three-
periodic for exons)
HMMgene (98) (▶http://www.cbs. Vertebrates, Three-periodic 4th CHMM
dtu.dk/services/ C. elegans order MM for
HMMgene/) exons, 3rd order
MM for introns
MORGAN (48) (▶http://www.cs. Vertebrates Decision tree DP
jhu.edu/labs/ system
compbio/morgan.
html)
MZEF (26) (▶http://argon. Human, mouse, Quadratic No
cshl.org/ Arabidopsis, fission discriminant
geneænder/) yeast analysis
Gene Annotation in Plants 649

Gene Annotation in Plants. Table 3 Ab initio gene prediction programs (possibly with homology integration)
(Continued)
Program Organism Gene elements Gene model Homology
SORFIND (24) Matrix method for No
start and splice
sites, hexamer
usage (Fourier
measure)
Twinscan (100) Mouse, human Genscan method; GHMM Genomic
5th order MM for sequence
UTR and inter-
genic, WAM for
acceptor sites
VEIL (47) (▶http://www.cs. Vertebrates HMM DP
jhu.edu/labs/ G
compbio/veil.html)
Xpound (156) (▶http://bioweb. Human Three-periodic 1st HMM
pasteur.fr/seqanal/ order MM for
interfaces/ exons, 1st order
xpound-simple. MM for introns and
html) intergenic

of plants to produce biopharmaceuticals is much more have many potential advantages for the production of
recent. Identification and mining of genes involved in recombinant proteins and the engineering of pharma-
the metabolism of such compounds in medicinal plants ceuticals. First, growing plants is more economical than
and in model systems is then an important issue. Gene industrial facilities with bioreactors. Second, starting
annotation is the basic step for it, and its high quality can from plant material is already documented. Third,
speed up and better focus the research on genes of purification is not required when the therapeutic product
potential pharmaceutical importance, especially the can be administrated as food. Fourth, plants can be
ones involved in secondary metabolism. directed to target proteins into organs and intracellular
Besides being a source of pharmaceuticals, plants are compartments that confer better and more stable
also seen as a convenient and inexpensive way to conservation (e.g. seeds). Fifth, the production levels
produce proteins and other molecules of medicinal that can be reached using modified plants approaches
interest, a practice often referred as “pharming”. Most industrial scales. Last, the risk of human health threats
genes can indeed be expressed in a wide range of due to contamination is reduced to a minimum.
organisms. Therefore expression systems need to be The first recombinant therapeutic proteins, successfully
tested for efficiency, cost and the biological activity of produced in tobacco plants were different forms and
the products as the demand for high production levels parts of immunoglobulin (Ig), making plants virtually
at low cost is important to make modern medicine unlimited sources of inexpensive monoclonal antibo-
available for an ever expanding world population. dies. Ig produced in plants can effectively prevent
Modified mammalian cells are in that respect valuable infectious diseases and cancers in the mouse model or
as far as the biological activity is concerned but their be used for in vivo tumor imaging. None are available
use is limited because of expensive culturing and commercially yet.
difficult scaling up. The advantage of microbial Glycosylation of proteins though remains a potential
organisms is that as well as the easier modification of problem, as N-glycans in plants are structurally more
the organisms, larger quantities can be manufactured diverse with a majority of oligosaccharides having
using industrial bioreactors. Their disadvantage is that β-(1,2)-xylose and α-(1,3)-fucose linked to the Man3-
proteins do not become correctly glycosylated for GlcNAc2 core. These are not found in mammalian
usage in humans and that some proteins lack the proper N-glycans, while plant engineered proteins are lacking
folding and disulfide bridges. Plants on the other hand, the sialic acid that represents 10% of the mouse sugar
650 Gene Chip Technology

content. These differences appear to have no influence


on the binding affinities of Ig in vitro, however, there is Gene Chip Technology and Its
some concern about the immunogenicity and allergeni- Application to Molecular Medicine
city of plant engineered proteins. To circumvent this
problem, proteins have been made edible, suppressing
the problem since plant materials are ubiquitous in the H EIKE Z IMDAHL , N ORBERT H ÜBNER
human diet, the rationale being that plant engineered Max Delbrück Center for Molecular Medicine (MDC),
proteins can trigger a mucosal immunity. A high Berlin, Germany
dosage is needed to maximize the chances of the nhuebner@mdc-berlin.de
protein reaching the gut intact. Therefore high expres-
sion levels in the plant tissues must be achieved. If
annotation proper is not giving an answer to this issue, Definition
taking into account the features selected for making DNA ▶microarrays, or DNA chips consist of thou-
efficient, ab initio prediction software can be used as a sands of individual DNA sequences arrayed at a high
guide. This tells us what makes a gene most adapted to density on a single matrix, usually glass slides or quartz
its genome style and in turn most expressed. The latest wafers, but sometimes on nylon substrates. Probes with
approach to achieving high expressions levels was to known identity are used to determine complementary
express the proteins in the chloroplast. Expressing binding, thus allowing the analysis of ▶gene expres-
proteins in chloroplasts is an alternative for high sion, DNA sequence variation or protein levels in a
expression with some advantages such as the absence parallel format.
of silencing of the engineered gene and the envir-
onmentally friendliness of the process, as spreading of
chloroplast-targeted genes to other plants becomes very Description
unlikely (6, 7). The analysis of gene expression levels in a certain
tissue, cell type or stage of development provides
essential information for any attempt to understand
References biological processes and to isolate differently regulated
1. The Arabidopsis Genome Initiative (2000) Analysis of the genes associated with a disease phenotype. Gene
genome sequence of the flowering plant Arabidopsis expression profiling may allow linkage of specific
thaliana. Nature 408:796–815 genes to disease susceptibility and drug response and
2. Mathé C, Sagot M-F, Shiex T et al (2002) Current methods the identification of genes and components of complex
of gene prediction, their strengths and weaknesses. Nucl
molecular pathways.
Acids Res 19:4103–4117
3. Burset M, Guigo R (1996) Evaluation of gene structure Various methods are available for detecting and
prediction programs. Genomics 34:353–367 quantitating gene expression levels, e.g. Northern blot,
4. Rogic S, Mackworth A, Ouellette F (2001) Evaluation of RNase protection assay, differential display, represen-
genefinding programs on mammalian sequences. Genome tational difference analysis (RDA) and serial analysis
Res 11:817–832 of gene expression (SAGE). For the high throughput
5. Pavy N, Rombauts S, Déhais P et al (1999) Evaluation of analysis of gene expression on a global level, the
gene prediction software using a genomic data set:
application to Arabidopsis thaliana sequences. Bioinfor- microarray has emerged.
matics 15:887–899 Microarrays usually contain thousands of arrayed
6. Daniell H, Streatfield SJ, Wycoff K (2001) Medical ▶cDNAs or ▶oligonucleotides.
molecular farming: production of antibodies, biopharma- After completion of the human genome sequencing, the
ceuticals and edible vaccines in plants. Trends Plant Sci estimated number of human genes is 30,000, but
6:219–226 there are more ▶mRNA species that arise e.g. by
7. Price B (2003) Conference on Plant-Made Pharmaceu-
▶alternative splicing. Most investigators use in their
ticals. IDrugs 6:442–445 http://www.cpmp2003.org/
experiments arrays comprising a substantial subset of
the entire transcriptome. Gene expression analysis
arrays are commercially available for a number of
organisms, e.g. H. sapiens, M. musculus, R. norvegi-
Gene Chip Technology cus, Arabidopsis, C. elegans and Drosophila.
Expression profiling using microarrays represents the
most efficient way of measuring gene expression –
▶Biochip Technology alterations in transcript levels in tissues or entire
▶DNA Chip Technology genomes can be simultaneously assayed (1-3). In
▶Gene Chip Technology addition to the capacity for parallel analysis of
▶Microarray Technology thousands of genes with a minimal amount of sample,
Gene Chip Technology and Its Application to Molecular Medicine 651

Gene Chip Technology and Its Application to Molecular Medicine. Figure 1 Schematic representation of the
experimental strategy of cDNA and oligonucleotide arrays (modified according to Schulze A, Downward J, Nat Cell
Biol 2001; 3:E190-E195(36). cDNA arrays: For the array preparation, inserts from cDNA clones (libraries) are
amplified and PCR products are spotted onto glass slides or nylon membranes at specific positions using arraying
robots. Target preparation: RNA from the two tissues or cell populations under comparison is used to synthesize
cDNA in the presence of either radioactively labeled nucleotides or nucleotides labeled with two different fluorescent
dyes, Cy3 and Cy5, during cDNA synthesis. Samples labeled with two different fluorescent dyes are mixed and are
hybridized to the array, whereas samples labeled radioactively (or with one fluorescent dye) are hybridized to
separate arrays. Signal intensity ratios are obtained by comparing either two different signals (Cy5/Cy3) on one array
or by comparing signals of genes represented on two arrays (array 1/array 2). High-density oligonucleotide arrays:
For the array preparation sequences of 16–20 short oligonucleotides (typically 25mers) are chosen from the mRNA
reference sequence of each gene. Light-directed, in situ oligonucleotide synthesis is used to generate high-density
probe arrays, usually containing over 300,000 individual elements. Target preparation: polyA+ RNA is prepared from
different tissues and used to generate double-stranded cDNA carrying a transcriptional start site for T7 DNA
polymerase. During in vitro transcription, biotin-labeled nucleotides are incorporated into synthesized cRNA
molecules. Each cRNA sample hybridizes separately to the array. Target binding is detected by staining with a
fluorescent dye coupled to streptavidin. Signal intensities on different arrays are used to calculate relative mRNA
abundance for genes represented on the array.

other advantages include the high sensitivity, the computational tools to analyze the vast amount of data
economy of size (miniaturization) and the use of non- and to enable comparisons between arrays and finally
toxic chemicals. the high costs of commercial microarrays.
Despite the enormous potential of the technology a Conceptually different approaches to the develop-
number of issues attenuate the power of microarrays, ment of microarray technology have resulted in the
e.g. the control for biological and environmental factors generation of two different array formats, oligonucleo-
and fluctuations, the validation of data, the need for tide and cDNA (‘targets’) arrays (Fig. 1). The ‘target’
652 Gene Chip Technology and Its Application to Molecular Medicine

cDNAs (or oligonucleotides) are immobilized on nylon provided by companies, like Clontech, Agilent, Incyte
membranes or glass slides. cDNA arrays are generated Genomics, Invitrogen (Research Genetics) or non-
by arraying PCR products of ▶cDNA libraries or clone profit, public, limited institutions like the German
collections usually onto glass or nylon substrates. Resource Center (▶www.rzpd.de).
cDNA arrays offer flexibility in the choice of arrayed For oligonucleotide arrays, short oligonucleotides from
elements and lower costs, particularly for the prepara- 20–25 mers (Affymetrix) up to 60mers (Agilent
tion of smaller, customized arrays for specific inves- Technologies) are usually synthesized in situ, either
tigations of a small number of genes. In addition, by photolithography onto silicon wafers (high-density
arraying of unsequenced clones from cDNA libraries oligonucleotide arrays from Affymetrix) or spotted by
can be useful for gene discovery. ink-jet technology (e.g. Agilent Technologies). Alter-
The advantage of the in situ synthesized, high-density natively, presynthesized oligonucleotides can be
oligonucleotide arrays (Affymetrix, www.affymetrix. spotted onto glass slides or glass-like matrices.
com) is the high reproducibility of in situ synthesis on Oligonucleotide arrays have certain advantages over
oligonucleotide chips, allowing an accurate compar- cDNA arrays, namely high reproducibility and the facts
ison of signals generated by samples hybridized to that the sequence information (usually EST sequences)
separate arrays. alone is sufficient to generate the DNA to be arrayed.
Furthermore the oligonucleotide arrays can be designed
Microarray Fabrication to allow both ▶SNP and alternative splicing analysis
The concept of being able to characterize large numbers and do not require amplification and purification of
of clones by ▶hybridization analyses of high-density cDNA fragments (5).
arrayed cDNA libraries was established more than a Since short oligonucleotides may result in less specific
decade ago (1). hybridization and reduced sensitivity, the arraying of
In general, arrays are described as macroarrays or presynthesized longer oligonucleotides (50–100 mers)
microarrays, the difference being the size of the sample has recently been developed to counteract these
spots and size of the array. Macroarrays are usually disadvantages. However, the high costs of commer-
printed on nylon membranes, contain sample spot sizes cially available, in situ-synthesized oligonucleotide
of about 250 microns or larger and can easily be imaged arrays and their accessories (spotters, scanners, analysis
by existing gel and blot scanners. The sample spot sizes software) as well as the time-consuming design of
in microarrays are typically less than 200 microns in oligonucleotide sets may limit their use for academic
diameter and probes are attached onto glass-like laboratories.
substrates. Depending on the arrayed material, the most Affymetrix has pioneered the oligonucleotide array
commonly used array platforms are cDNA arrays and technology and generated a number of different
oligonucleotide arrays (▶http://www.gene-chips.com/). commercially available arrays for various organisms
For the generation of a cDNA microarray (4), cDNA (▶www.affymetrix.com).
clones representing as many unique transcripts as
possible are either selected within ▶EST data (▶http:// The Microarray Experiment
www.ncbi.nlm.nih.gov/ UniGene and ▶http://www. Careful experimental design of the microarray will
tigr.org/tdb/tgi/hgi) or tissue-specific cDNA libraries ensure the maximal potential gain in efficiency and is
are constructed or ordered (www.rzpd.de). The cDNA particularly important if the resulting experiment is to
clone inserts are PCR amplified from plasmid DNA or be maximally informative, given the effort and the
amplified from bacterial cultures. In high-throughput resources.
applications the amplification of clones in cultures There are many protocols and different types of
stored in 384-well plates is more cost efficient and less systems available; the basic procedure for a large-scale
labor intensive than amplification from plasmid DNA. measurement of gene expression involves the prepara-
PCR products can be purified to remove unincorpo- tion of total or mRNA from the biological sample(s)
rated nucleotides and primers, e.g. by filtration using under investigation (e.g. ‘candidate’ organ) and the
silica systems. Amplified PCR products are spotted, hybridization of copied ‘labeled’ RNA to the array
usually in denaturing or high-salt buffer, onto glass (Fig. 1).
slides or nylon membranes using robotic systems. In most cases, the extracted mRNA is converted to
Spots are typically 100–300 μm in size and are spaced cDNA (▶reverse transcription), labeled and hybridized
about the same distance apart. Using this technique, to the DNA elements on the array surface of the array.
arrays consisting of more than 30,000 cDNAs can be In some cases (e.g. hybridization of Affymetrix chips)
fitted onto the surface of a conventional microscope the cDNA is labeled during in vitro transcription and
slide. ▶cRNA is hybridized. To ensure a high reproducibility,
Commercially available cDNA arrays consisting of fluctuations in sample preparation and hybridization
thousands of distinct sequence-verified genes are need to be reduced to a minimum. Major sources of
Gene Chip Technology and Its Application to Molecular Medicine 653

random fluctuations to be expected are in probe, target cluster analysis uses a standard statistical algorithm to
and array preparation, e.g. in mRNA preparation, arrange and organize genes according similar patterns
reverse transcription, labeling, target volume, hybridi- of gene expression (10).
zation parameters, overshining effects, non-specific The data management is of particular importance for
background, variations in pin geometry during spotting further downstream analyses. Databases are an im-
of cDNA, slide inhomogeneities and image analysis portant resource for storing and retrieving the vast
(6). Replicates of each experiment should be used in amount of data generated in a microarray experiment.
order to reduce variability and to differentiate between A number of gene expression databases have been
experimental variation and real expression differences. generated and are accessible to the public (e.g. Gene
Suitable internal controls ensure quality control Expression Omnibus at the National Center for
measurements for samples and array. After the Biotechnology Information, www.ncbi.nlm.nih.gov/
hybridization process, intensity signals from the geo and ArrayExpress at the European Bioinformatics
hybridized RNA samples are detected by phospho- Institute, EMBL-EBI www.ebi.ac.uk/ arrayexpress/).
imaging or fluorescence scanning and independent Due to the experimental variation inherent in micro-
images are generated. array experiments, the validation of the results by
alternative techniques, such as quantitative real-time G
Analysis and Data Management PCR or Northern blotting is advisable.
Microarray experiments generate large and complex The most challenging part is “making sense” of the
data sets, e.g. lists of spot intensities and intensity complex data retrieved, including to distinguish, whether
ratios. Basically, the data obtained from microarray the gene’s expression change is part of the etiology of the
experiments provide information on the relative disease or part of the pathology of the disease. The first
expression of genes corresponding to the mRNA task is the identification of the relevant gene(s), the
sample of interest. Computational and statistical tools prediction of protein function and the identification of the
are required to analyze the large amount of data in order genetic variation that modulates gene expression,
to address biological questions. including the number of loci involved, the effect of each
Once images of hybridized microarrays are processed, single locus and the interaction between loci.
arrayed spots are identified, relative signal intensities
for each spot are measured and background intensity is Clinical Applications
subtracted. Signal intensities are usually normalized to Microarrays certainly have multiple applications many
compensate for experimental variability and to ‘bal- of which will develop and evolve over time. Although
ance’ the signals from the two samples being compared the first application of microarrays was in monitoring
(7). All normalization techniques assume that all or a gene expression, the strategy of using arrayed biomo-
subset of spots (e.g. genes) on the array have an lecules to examine a biological sample is generally
average expression ratio equal to one. The normal- applicable, e.g. for mutation screening, ▶polymorph-
ization factor is then used to adjust the data (signal ism analysis, mapping and other applications. An
intensities) from the two samples and to ensure that that increasing number of human diseases result from
the total quantity of RNA hybridized to the array is the alterations in DNA sequence and/or altered gene
same. Finally mean spot or transcript intensities are expression patterns. Therefore, information about up-
calculated and ratios of intensities are used to account and down-regulation of multiple genes is important for
for relative expression differences. In a simple pairwise identification of disease genes, understanding of gene
comparison of gene expression between two samples, function and for potential therapeutic and/or diagnostic
the results can be shown in plots of the intensities or the applications. The first clinical application may be the
log of the intensity ratios. Scatter plots are widely used use of microarrays for the molecular classification of
to make the observed differential expression visible. cancer (see disease diagnosis).
A variety of software tools utilizing different mathe- Although gene expression analysis is a powerful
matical algorithms to perform microarray image approach to identify characteristics of disease states
analysis are available; a detailed discussion of the or signaling pathways, it should be noted that gene
analytical tools is beyond the scope of this article (see expression levels often represent complex, quantitative
reviews 8, 9). phenotypes, influenced by environmental and genetic
The first information obtained after data analysis and factors and the regulation of mRNA levels is only one
extraction of gene expression analysis is identification aspect of biological control. Protein levels are also
of those genes with significant differential expression controlled at several post-transcriptional steps and
in two samples or in a time series after a given protein activity is controlled by post-translational
treatment. To address the full potential of genome-scale modification. A complete picture may be obtained by
experiments a ▶cluster analysis is performed to studying the global level of cellular proteins by
analyze the entire repertoire of transcripts. Basically, proteomics (e.g. protein microarrays).
654 Gene Chip Technology and Its Application to Molecular Medicine

Disease Diagnosis Therapeutic Consequences


The real promise of the examination of gene expression It has been suggested that microarrays will be routinely
using microarrays is to identify genes, which are used not only in the selection, assessment and quality
consistently up- or down-regulated and play significant control of the best drugs for pharmaceutical develop-
roles in the development and progression of disease. ment and for disease diagnosis, but also for monitoring
For example, the over-expression of certain genes is disease status and the outcomes of therapeutic inter-
correlated with a certain type of cancer. By monitoring ventions (11-15). Microarrays may be of great benefit
expression a new generic approach for cancer classi- with respect to personalized medicine, i.e. to select the
fication has been established and a comprehensive and most likely effective drug, to individualize dosing and
commonly accessible catalog of gene expression to minimize the safety risk for cost effective health care
profiles will make an accurate multiclass cancer management. Microarrays may also be useful to
classification feasible. Improvement in precise, objec- identify bacterial strains that are resistant to known
tive and systematic tumor classification at the mole- antibiotics to avoid a lack of response in certain
cular level will advance cancer treatment (3, 11). patients.
One important aspect is to find a correlation between
therapeutic responses to drugs and the genetic profiles
Pharmacogenomics of patients to address questions, such as why do some
The information about gene expression and sequence drugs work better in some patients than in others and
variation will also have an impact on many aspects of why some drugs may even be highly toxic to certain
the drug discovery process and on drug efficacy (12). patients?
Use for prognostic markers or to identify therapeutic Additionally to gene expression profiling, microarrays
targets has great potential. are especially suited for high-throughput genotyping to
Microarrays are used at specified steps in the process of examine sequence variation, i.e. to screen SNPs (single
drug development: nucleotide polymorphisms). An individual’s genotype
. Target discovery, to identify genes or pathways with or genetic profile may allow 1. determination of disease
altered expression in diseased human tissues or in association and assessment of disease risk and 2.
animal models of disease. determination of dosage and type of drug. This will
. Target validation, to determine that a gene product is enable the selection of the most appropriate and
causative of disease symptoms or that activation of efficient drug and reduce chances of an adverse drug
the target protein ameliorates disease symptoms. An reaction (16, 17).
agonist/activator or an inhibitor, which may be To realize the potential of microarrays, it will be a
therapeutic, could be identified using microarrays. challenge to develop a cooperative framework that
. Compound optimization, to screen a series of meets the requirements of basic research and clinical
therapeutic drug candidates to find the compounds medicine.
that are most specific for the target protein and those ▶Microarray Data Analysis
that cause unintended effects.
. Drug metabolism, to predict whether a drug
candidate will cause drug–drug interactions. References
. Drug efficacy, to identify the individual mode of 1. Lennon GG, Lehrach H (1991) Hybridization analyses of
action or adverse effects of a given drug. arrayed cDNA libraries. Trends Genet 7:314
. Microarrays should therefore prove useful in under- 2. Schena M, Shalon D, Heller R et al (1996), Parallel
standing the mechanistic basis of action of many human genome analysis: microarray-based expression
monitoring of 1000 genes. Proc Natl Acad Sci USA
drugs. 93:10614
3. De Risi J, Penland L, Brown PO et al (1996) of a cDNA
microarray to analyse gene expression patterns in human
Toxicological research: Toxicogenomics cancer. Nat Genet 14:457
Toxicogenomics is concerned with the identification of 4. Schena M, Shalon D, Davis RW et al (1995) Quantitative
potential human and environmental toxicants and with monitoring of gene expression patterns with a comple-
finding correlations between toxic responses to tox- mentary DNA microarray. Science 270:467
icants and changes in the genetic profiles of the objects 5. Lockhart DJ, Dong H, Byrne MC et al (1996) Expression
exposed to such toxicants through the use of genomics monitoring by hybridization to high-density oligonucleo-
tide arrays. Nat Biotechnol 14:1675
resources. DNA microarrays allow the identification of 6. Lee ML, Kuo FC, Whitmore GA et al (2000) Importance
highly sensitive and informative markers for toxicity, of replication in microarray gene expression studies:
by monitoring the expression levels of thousands of statistical methods and evidence from repetitive cDNA
genes simultaneously. hybridizations. Proc Natl Acad Sci USA 97:9834
Gene Duplications 655

7. Schadt EE, Li C, Ellis B et al (2001) Feature extraction


and normalization algorithms for high-density oligonu- Gene Dosage Analysis
cleotide gene expression array data. J Cell Biochem 37:120
8. Butte A (2002) The use and analysis of microarray data.
Nat Rev Drug Discov 1:951
9. Quackenbush J (2001) Computational analysis of Definition
microarray data. Nat Rev Genet 2:418 Gene dosage analysis is aimed at quantifying the copy
10. Eisen MB, Spellman PT, Brown PO (1998) Cluster number of a gene in an individual.
analysis and display of genome-wide expression pat- ▶Spinal Muscular Atrophy
terns. Proc Natl Acad Sci USA 95:14863
11. Ramaswamy S, Ross KN, Lander ES et al (2003) A
molecular signature of metastasis in primary solid
tumors. Nat Genet 33:49
12. Gerhold DL, Jensen RV, Gullans SR (2002) Better Gene Duplications
therapeutics through microarrays. Nat Genet 32:547
13. Golub TR, Slonim DK, Tamayo P et al (1999) Molecular
classification of cancer: class discovery and class prediction S EBASTIAN M. S HIMELD
by gene expression monitoring. Science 286:531
14. Jain KK (2000) Applications of biochip and microarray
Department of Zoology, University of Oxford, G
systems in pharmacogenomics. Pharmacogenomics 1:289 Oxford, UK
15. Marton MJ, De Risi JL, Bennett HA et al (1998) Drug sebastian.shimeld@zoo.ox.ac.uk
target validation and identification of secondary drug
target effects using DNA microarrays. Nat Med 4:1293
16. Shi MM (2002) Technologies for individual genotyping: Synonyms
detection of genetic polymorphisms in drug targets and
disease genes. Am J Pharmacogenomics 2:197 segment duplication, chromosome duplication,
17. Guzey C, Spigset O (2002) Genotyping of drug targets: a ▶trisomy and polyploidy. Sister genes separated by
method to predict adverse drug reactions? Drug Saf duplication are referred to as paralogues.
25:553
Definition
The term gene duplication describes a situation where a
single ancestral gene has been copied such that two (or
more) copies of that gene now exist in a single genome.
Gene Cluster Gene duplications are saltatory events in genome
evolution in that the change occurs in one generation,
with the parents having one copy while the offspring
Definition have two copies. Therefore, as with other genetic
Gene cluster refers to a set of co-regulated genes, changes, gene duplications typically first exist in one
presumably belonging to the same functional class. individual. Whether they spread to become fixed in a
▶RNAi Interference in Mammalian Cells population or are purged will primarily depend upon
the effect they have on the organism’s ▶fitness. At a
practical level, gene duplications that have become
fixed within a taxon are recognised retrospectively by
the presence of homologous genes within one genome.
Gene Conversion These are best known in the context of gene families,
groups of genes of related sequence inferred to have
originated by repeated gene duplication from a single
ancestral gene.
Definition
Characteristics
Gene conversion refers to a non-reciprocal, limited The Origin of Duplicated Genes
transfer of sequence information from one chromo- The prevalence of gene families in eukaryotic and
some to another, or from one chromosome region to prokaryotic genomes suggests gene duplication has
another region, of the same chromosome. The donating played a prominent role in evolution. How do gene
sequence is copied to the receiving chromosome, which duplications occur? Probably the most common event
is thus “converted”. The donating sequence remains is tandem duplication, as evidenced by the prevalence
unchanged. of homologous genes situated in tandem in most
▶Chromosomal Instability Syndromes genomes. Tandem duplication probably occurs most
▶Spinal Muscular Atrophy frequently due to mis-recombination at ▶meiosis,
656 Gene Duplications

although other factors might also be important. For many genes, however, it is likely that duplication has
Repeated tandem duplication without purging of little or no effect on fitness and is effectively neutral. A
duplicated genes results in ▶gene clusters, that is a common way to view the fate of duplicated genes in this
group of related genes situated next to each other in the context was as a race between the accumulation of
genome. Such gene clusters can be evolutionarily mutations that eventually silence one gene (turning it
ancient (for instance the ▶Hox gene cluster), but are into a ▶pseudogene which would accumulate more
more commonly of relatively recent origin, for example mutations over evolutionary time and eventually
the cluster of zinc finger genes on chromosome 19q12. become unrecognisable) and the acquisition of diver-
Genes can also duplicate in other ways. Single genes gent functions by the two duplicates and hence the
can be duplicated elsewhere in the genome by several necessity for an organism to maintain both. Empirical
potential mechanisms, including via piggybacking on data does not support this model, however, suggesting a
adjacent active retrotransposons or insertion of a high rate of retention of duplicate genes (3).
reverse transcribed mRNA into the genome by a viral More recently, a modified form of this model has been
reverse transcriptase. Several mechanisms allow multi- proposed. Many genes, particularly in multicellular
ple genes to be duplicated simultaneously. Sections of organisms, are multifunctional, either at the biochem-
DNA containing multiple genes can duplicate either in ical level or in terms of regulated expression in different
tandem or elsewhere in the genome (for example by organs or cell types. This creates the possibility of
transposition). Evidence suggests that recently dupli- duplicated genes diverging by subfunctionalisation,
cated segments form an estimated 5% of the human which implies that duplicate genes diverge by main-
genome, suggesting this is a frequent and ongoing taining separate functions that were all initially
process (1). These are also known as segmental or maintained by the single gene ancestor. An organism
block duplications. Duplication of an entire chromo- must maintain both copies in its genome or lose one of
some with all its incumbent genes can also occur. the functions, providing a selective reason for gene
Although this is strongly selected against in humans duplicates to be retained in a population. Genes whose
and typically unviable (an exception is trisomy of expression in multiple tissues is regulated by separate
chromosome 21), some other lineages appear to be able ▶enhancers may be particularly prone to subfunctio-
to tolerate such duplications more easily and aberrant nalisation, as mutations in different enhancers of two
chromosome numbers are especially common in duplicate genes would necessitate the retention of both,
flowering plants. Finally, there is evidence that whole even if the encoded amino acid sequences were
genome duplication has occurred in several lineages, identical. This provides a potential explanation for the
including in yeast, plants and salmonid fish (2). There apparently high retention rate of duplicate genes, at
is also evidence that genome duplication occurred early least in multicellular organisms (4).
in vertebrate evolution and therefore that all vertebrates A corollary of duplicate gene divergence is redundancy
including humans are ancestrally polyploid. Duplica- and compensation. Mutational analysis of families of
tion of an entire genome includes duplication of all its homologous genes often reveals a degree of redun-
incumbent genes. dancy, such that the phenotype of a ▶double mutant is
more severe than the sum of the phenotypes of each
The Fate of Duplicated Genes single mutant. This implies the genes are partially
When examining the genomes of living organisms, we redundant and that one gene can in part compensate for
infer the presence of previous gene duplications by the the lack of the other, due to co-expression in the same
existence of genes with homologous sequences. By their tissue and a similar biochemical function.
nature, these represent gene duplications that have
become fixed within a population such that all members Evolutionary Implications of Gene Duplication
of a species (or higher taxonomic division) possess both Gene duplication has the potential to provide a lineage
copies. The route from the initial gene duplication to with ‘new’ genetic material, in that one copy of the
▶fixation or loss is affected by several factors. Some gene is, in principle, free to evolve new functions,
gene duplications confer an obvious ▶selective advan- while the other maintains existing functions. This
tage and spread rapidly through a population, for concept has led to the suggestion that gene duplication
example the tandem amplification of esterase genes in and entire genome duplication may play an important
mosquitoes can confer a degree of resistance to specific role in evolutionary innovation. However direct
pesticides. Conversely, fixation of a duplicated gene experimental support for this suggestion is lacking
within a population is by no means certain. Many gene and some evidence suggests that, contrary to the above,
products are required at a specific dosage and duplica- both duplicate copies experience purifying ▶selection
tion disrupts this balance. This is likely to reduce the following duplication (5). Nevertheless, it is common-
fitness of the individual carrying the gene duplication, place to find gene family members that have undoubt-
leading to it being rapidly purged from the population. edly evolved by duplication playing different roles in
Gene Expression Data Analysis: Supervised Analysis 657

the development, biochemistry or physiology of an structures present and operating in the cell. Expressed
organism. This implies that some degree of evolu- genes include those that are transcribed into mRNA and
tionary innovation frequently follows gene duplica- then translated into protein, and those that are
tion. A second possible evolutionary consequence transcribed into RNA but not translated into protein
of gene duplication is the disruption of successful (e.g. transfer and ribosomal RNAs).
interbreeding between populations in which different ▶Microarrays in Colorectal Cancer
genes have duplicated. This suggests gene duplication
might be a powerful force behind the evolution of
reproductively isolated populations, and therefore of
new species (6).

Clinical Relevance
The phenotypic effect of chromosomal aberrations Gene Expression Data Analysis:
involving duplication of segments of DNA or the
possession of extra chromosomes is due to the Classification
imbalance of gene products deriving from the dupli- G
cated genes. ▶Down syndrome (trisomy of chromo-
some 21) is probably the best-known case involving an Definition
entire chromosome, as, unlike most trisomies in For classification, algorithms such as weighted-voting,
humans, embryos carrying an extra chromosome 21 k-nearest-neighbor classifiers, support vector machines
are viable. There are numerous other syndromes and artificial neural networks can be applied to the set
involving the duplication of specific chromosomal of genes selected using supervised analysis of gene
regions (7). Incomplete gene duplications may also expression to build models capable of predicting the
result in the fusion of two genes at the boundary between class of a particular sample. To test the robustness of
the original and duplicated DNA. This can result in classification, these methods are often coupled with a
novel gene products with deleterious properties. leave-one-out cross-validation analysis, in which one
of the samples from the original ‘training’ set is
withheld and a class prediction is made on the withheld
References sample. For complete validation, gene lists should be
1. Eichler EE (2001) Recent duplication, domain accretion tested on a second ‘test’ set of samples that were not
and the dynamic mutation of the human genome. Trends used to derive the discriminatory gene list.
Genet 17:661–669 ▶DNA Chips
2. Skrabanek L, Wolfe KH (1998) Eukaryotic genome ▶Gene Chip Technology and Its Application in
duplication – where’s the evidence? Curr Opin Genet Molecular Medicine
Dev 8:694–700
3. Hughes MK, Hughes AL (1993) Evolution of duplicate ▶Microarrays in Colorectal Cancer
genes in a tetraploid animal, Xenopus laevis. Mol Biol
Evol 10:1360–1369
4. Lynch M, Force A (2000) The probability of duplicate
gene preservation by subfunctionalisation. Genetics
154:459–473
5. Wagner A (2002) Selection and gene duplication: a view
from the genome. Genome Biol 3:1012.1–1012.3
6. Lynch M, Conery JS (2000) The evolutionary fate and Gene Expression Data Analysis:
consequences of duplicate genes. Science 290:1151–1155
7. Inoue K, Lupski JR (2002) Molecular mechanisms Supervised Analysis
for genomic disorders. Annu Rev Genom Hum Genet
3:199–242
Definition
Using this method, one searches for genes whose
expression patterns correlate with an external para-
meter. The most commonly used ‘supervising’ para-
Gene Expression meters are clinical features such as survival, presence of
metastases and response to therapy. Many statistical
metrics have been used successfully in ‘supervised’
Definition analyses, including the standard t-test, permutation-
Gene expression describes the process by which a based tests, and signal-to-noise ratios.
gene’s coded information is converted into the ▶Microarrays in Colorectal Cancer
658 Gene Expression Data Analysis: Unsupervised Analysis

Gene Expression Data Analysis: Gene Mapping


Unsupervised Analysis
▶Genetic Epidemiology
Definition
No external feature is used to guide the analysis process
of gene expression. The data are used to search for
patterns without any a priori expectation concerning
the number or type of groups that are present. The most
Gene Ontology
common ‘unsupervised’ analysis method is hierarchi-
cal cluster analysis, based on similarity metrics.
▶Microarrays in Colorectal Cancer Definition
Gene Ontology is a project to produce a controlled
vocabulary of terms relating to molecular function,
biological process, or cellular components, developed
by the Gene Ontology Consortium. Such controlled
vocabulary allows consistent use of terminology when
describing the roles of genes and protein in cells.
Gene Expression Data Matrix ▶Protein Databases

Definition
Gene expression data matrix refers to a table where
each row represents a gene, each column represents a Gene Silencing
particular sample, or a particular experimental condi-
tion, and each position contains a number or a set of
numbers characterising the expression level of the Definition
particular gene under the particular experimental Gene silencing refers to repression of genes by
condition. the formation of a specialized chromatin structure
▶Microarray Data Analysis (heterochromatin). Silenced genomic regions carry
specific histone modification patterns (hypoacetyla-
tion, H3 K9 methylation in some organisms) and are
bound by heterochromatic proteins.
▶Chromatin Acetylation

Gene Expression Profile

▶Expression profile Gene Silencing by Double-Stranded


RNA

▶RNA Interference in Mammalian Cells

Gene Gun

Definition Gene Targeting


A gene gun is a tool for in vivo transformation of cells
or organisms (e.g. gene delivery; DNA vaccination).
The gun is loaded with DNA- or RNA-coated gold Definition
particles that are injected into cells or tissues using a Gene-targeting describes the production of a modified
helium pressure pulse. allele of a gene by the process of homologous
▶DNA-based Vaccination recombination between a modified exogenous DNA
Genetic Algorithm 659

molecule with its endogenous counterpart (target). In


transgenic technology, it is used to introduce a targeted Gene-Environment Interaction
mutation into the genome of mouse embryonic stem
cells (▶ES-cells). When injected into blastocysts,
mutant ES-cells contribute to all tissues of the embryo, Definition
including germ cells. Once the mutant gene has entered Gene-environment interaction describes an interplay
the germ line, mouse strains heterozygous and homo- between genetic variation and environmental triggers
zygous for the mutated gene can be bred and analysed. (e.g. smoke exposure, infections), which influences
▶Jun/Fos susceptibility to a certain disease.
▶Morpholino Oligonucleotides, Functional Genomics ▶Atopy Genetics
by Gene ‘Knock-down’
▶Mouse Genomics
▶Transgenic and Knockout Animals

Gene-Gene Interaction
G

Definition
Gene Therapy Gene-gene interaction describes an interplay between
variations in two different genes that influence(s)
susceptibility.
▶Clinical Gene Transfer ▶Atopy Genetics

General Transcription Factors


Gene Trapping
Definition
General transcription factors describes a set of common
Definition transcription factors that, together with RNA polymer-
Gene trapping comprises of a strategy by which the ase, are necessary and sufficient to direct accurate
insertion of a targeting vector into a gene leads to transcription initiation from a core promoter in vitro.
reporter gene activation. The inserted vector sequence ▶Core Promoters
acts as a tag that facilitates rapid cloning of the trapped
gene.
▶Large-Scale ENU Mutagenesis in Mice
▶Medaka as a Model Organism for Functional
Genomics
▶Mutagenesis Approaches in the Zebrafish Genetic Algorithm

Definition
Genetic algorithm is a statistical/mathematical term/
method that is aimed at finding the optimal solution to
a question. The process is the same as natural selection:
Gene-Based Therapies (1) selection of the strongest individuals/solutions,
(2) production of new organisms/new solutions,
by mixing the previously selected elements, and
Definition (3) mutations, which are accidental changes in the
Gene-based therapies involve the transfer of genetic organisms/solutions. The process is reiterated until no
material into a host with the hope of ameliorating or more improvement can be done. The last solution is
curing a disease. taken as the final one.
▶Limb Girdle Muscular Dystrophies ▶EST Mining for Expression Analysis
660 Genetic Background

so called flanking region problem). Practical solutions


Genetic Background to this problem exist and will be presented. While the
examples will be drawn mostly from the field of
behavioral neuroscience, the points illustrated are valid
R OBERT G ERLAI for any biological trait.
Psychology Department, University of Toronto at
Mississauga, Mississauga, Canada The Complicating Effect of Compensatory Mechanisms
robert_gerlai@yahoo.com With gene targeting one can knock out a gene in vivo
and create a mutant organism that lacks the gene
product. The promise of gene targeting has been to
Synonyms reveal the in vivo function of the gene of interest.
Host genotype; host genome However, the functional relevance of gene targeting has
been questioned [reviewed in (3)] because the mutation
may lead to an avalanche of compensatory processes
Definition (e.g. up- or down-regulation of other genes) and
Genetic background represents all the genes in the resulting secondary phenotypical changes. Compensa-
▶genome. The effects of “background” genes are often tion may be due to ▶genetic redundancy. Genetic
considered with regard to their ability to influence or redundancy in this context means that putative “helper”
modify the effects of ▶mutations artificially generated genes take over the function of the targeted one, e.g.
by the experimenter in model organisms such as the become up-regulated and compensate for the absence
mouse. Mutations may be induced or introduced using of the product of the targeted gene. Although labor
targeted mutagenesis (▶Targeted Gene Disruption), for intensive, proper analysis of compensatory changes
example by ▶homologous recombination in ▶em- may allow one to reveal how biochemical pathways
bryonic stem cells in the mouse or by using random interact. For example, Chen et al. (1) showed that
mutagenesis with chemical mutagens including ethyl although a null mutation in protein kinase C γ subtype
nitroso urea ▶ENU. These mutations interact with the (PKCγ) in mice resulted in an apparently normal long-
genetic background and will express their effects at the term depression (LTD) in the cerebellum of the
phenotypical level in a manner modulated and mutants, LTD could be blocked by a PKC inhibitor
modified by the genetic background, a phenomenon only in control mice but not in the null mutants,
called ▶epistasis. The present essay will focus on the suggesting that LTD was mediated, at least partly, by
genetic background from this perspective. It will non-PKC dependent processes in the null mutant mice.
explain the complications associated with epistasis “Compensatory” processes, however, can also induce
and with genetic linkage using examples of ▶knock out phenotypical changes. For example, assume gene α
(or ▶null mutant) mice developed for the analysis of serves hypothetical function ‘A’. Also assume that
molecular mechanisms of mammalian brain function targeted gene α is compensated for by gene β in the
and behavior. The issues raised in this essay, however, genetic background; gene β becomes up-regulated
are general and will be valid for several other genetic in response to the absence of α gene product. The
manipulation approaches and for all fields of biology. excess of gene β product is able to compensate for the
lack of gene α product and no change is observed in
Characteristics function ‘A’ at the phenotypical level. However, over-
▶Gene targeting allows to create null mutations in expression of gene β may have some pleiotropic
mice and to analyze how the mutant organism responds effects, i.e. may affect functions other than ‘A’,
to the lack of the product of a ▶single gene. This has similarly to the way over-expression of genes alters
facilitated the molecular dissection of such complex the phenotype of transgenic mice. These functional
traits as mammalian brain function and behavior. alterations when observed by the investigator will be
However, numerous problems have been pointed out assigned to gene α. Although they are indeed due to the
(2, 4, 5, 6) with regard to the interpretation of the introduced mutation of gene α, they need not reveal
phenotypical changes observed in null mutant mice. the function of this gene per se because they are related
Briefly, the most controversial issues stem from the fact to this gene only indirectly. This example is not
that scientists overlooked the influence of the genetic hypothetical. Empirical evidence for similar situations
background. These issues can be divided into two main can be found in the literature (2).
categories, both of which have general importance in Teasing out the direct and indirect effects of the
genomics. The first is a cluster of problems associated mutation is not trivial and dissection of the molecular
with compensatory mechanisms. This problem is rather mechanisms underlying complex phenotypical traits
difficult as there is no general solution to avoid it. The will require meticulous studies in which numerous
second problem is associated with genetic linkage (the factors need to be controlled. Lathe (6) discusses
Genetic Background 661

several confounding factors. Instead of reiterating his the genetic background will not allow one to conclude
arguments, the reader’s attention is now drawn to what with certainty that a particular phenotypical change
is proposed to be the main issue, the necessity of a observed in the null mutant was due to the null
“systemic approach” view in gene targeting (5) (also mutation or to the genetic background. This issue is
see ▶Phenomics). especially problematic if the genetic background of the
null mutant animals is different from that of their wild
The Need for a Systemic View type control counterparts, which is a typical problem in
The principal problem in genomics is a systemic one a large number of gene targeting studies.
that concerns biological organization and the functional
units of this organization. From a geneticist’s viewpoint Null Mutant Mice of Gene Targeting Studies
the units of biological organization are the genes and Are Often the F2 Offspring of Two Mouse Strains
their function is to encode particular proteins. How- The genetics of the breeding strategy usually employed
ever, when it comes to the question of phenotypical to generate null mutant mice is explained in Fig. 1. The
effects, genes may not be the units and the definition figure also depicts why the hybrid genetic background
of their function may be complicated. Clusters of genes leads to complications.
in the genetic background defined by higher organiza- G
tional level phenomena, including developmental, The Flanking Allele Problem
physiological or even behavioral, may represent the The F2 population is a segregating population in which
functionally relevant unit. Disruption of a single gene mice have recombinant genotypes derived from the two
may alter a biochemical cascade within the functional parental mouse strains (Fig. 1). The difficulties arising
gene cluster. Expression levels of the genes belonging from this are threefold. First, the recombination pattern,
to a functional cluster may change in concert. i.e. which locus contains strain 129 and which B6
Investigation of such changes may reveal the biological alleles and whether in a homozygous or heterozygous
organization of the organism. The boundaries of form, may be different between littermates. Thus not
putative gene clusters may not be sharp. Some genes even wild type littermates of their mutant counterparts
may belong more, others less, to a specific functional represent an appropriate control, since their alleles
gene group. This also implies that the gene group could be different from those of the mutants not only at
organization may not be orthogonal, i.e. some genes the locus of the gene of interest but also at other loci.
may belong to more than one functional group. This may lead to false positive results. Second, due to
Functional groups may be hierarchically organized. A the genetic variation resulting from the hybrid
smaller number of genes may define subgroups that segregating background, detecting significant effects
may make up groups that in turn may be organized into of the mutated gene may be difficult, which leads to
super-groups, etc. Disrupting single genes will perturb false negative results. These two problems can be
the organism and will force it to respond in a way alleviated by measuring larger numbers of animals, i.e.
inherent in its biological organization. The phenotypi- by increasing the power of statistical comparisons and
cal changes one observes are the reflection of this decreasing the possibility of sampling error. Increasing
organization. Instead of looking for the function of the sample size, however, will not solve the third
single genes, it is proposed that investigators should problem, which is associated with genetic linkage. If
take a systemic organizational view into consideration, the targeted mutagenesis is conducted in ES cells from
an approach conceptually similar to metabolic control strain 129, the chromosome with the targeted locus will
analysis employed in biochemistry. carry alleles of genes of 129-type. The probability of
In summary, the effects of genetic manipulation must genetic recombination is generally inversely related to
be investigated at all practically feasible levels of the distance between the loci of the genes (linkage).
biological organization, including gene expression Thus, the 129-type alleles of the genes whose loci are
patterns, protein-protein interactions and a broad close to the locus of the mutated gene will remain
spectrum of phenotypical traits that may be affected together with the mutated allele of the gene of interest
by the direct and indirect effects of the mutation. (Fig. 1). In other words, any time the mutation is
detected in a mouse, e.g. by ▶Southern blotting or
Polymorphism in the Genetic Background May PCR, that particular animal will also carry the linked
Make the Results of Gene Targeting Studies 129-type genes with high probability. Conversely, a
Difficult to Interpret non-mutant control, animal will not carry these
Assume that knock out of gene α leads to differential 129-type alleles and will have B6 alleles with high
expression of alleles b vs. B of gene β, and a regulatory probability if the 129-ES cell chimera was crossed to
change of gene β leads to different phenotypical effects B6. In effect, the mutation can be seen as a ▶marker for
depending on which allele (b or B) is present in the α the 129-type genes linked to the locus of the targeted
null mutant mouse. Consequently, ▶polymorphism in gene. Consequently, any phenotypical differences
662 Genetic Background

observed between mutant and control littermates of


the hybrid genetic origin may be due either to the
introduced null mutation or to the background genes
linked to the targeted locus. Thus, one may find false
positive results [for experimental examples see (3)].

Solutions to the Flanking Allele Problem


In order to decrease the probability of contribution of
variable background genes, one could backcross the
mutant hybrid animals for several generations to the
strain of choice, e.g. to B6, and create a ▶congenic
strain that carries the mutation on the desired genetic
background. However, complete elimination of 129-
type genes that surround the locus of the gene of
interest is not practically possible. For example, even
with 12 generations of backcrossing (approximately 2
years of breeding) to B6, the length of the 129-type
chromosome segment introduced to the B6 genome

how genes at loci other than the targeted one will be


inherited. Crossover events during the meiotic process of
gametogenesis will “shuffle” the alleles of these back-
ground genes and will create recombinant chromosomes
(c), which will characterize the genotype of the sperm and
the egg of the F1 mice. The genotype of an F2 individual,
therefore, will be represented by a pair of such
recombinant chromosomes. For example, a homozy-
gous null mutant mouse may have chromosomes a and
b, a and c, or b and c; a heterozygous mouse may have
one of the recombinant chromosomes with the null
mutation (a, b, or c) and another without the null mutation
Genetic Background. Figure 1 The genetic back- (d, e, or f); whereas a wild type control mouse may have
ground of mice generated by gene targeting. Most gene chromosomes d and e, d and f, or e and f. Panel C shows
targeting is carried out in cultured embryonic stem (ES) that the null mutant allele of the targeted gene will be
cells derived from one of the substrains of mouse strain surrounded by 129-type genes, however, the wild type
‘129’. The 129-type ES cells carrying the targeted allele of the gene will be surrounded by B6 type genes.
mutation are introduced into a blastocyst and the This linkage disequilibrium is simply due to the fact that
surviving chimeric embryos develop to term, are raised the null mutant allele came from a strain 129 genetic
to adulthood and are mated to “wild type”, i.e. non background. In an animal produced from mating such F2
mutated, mice. ES cells originating from mouse strain mice (F3 or the following generations), the null mutant
129 carry one chromosome (white) with the disrupted allele could be surrounded by B6 genes only if, during the
allele (double lines) of the targeted gene. If these ES meiotic processes of gametogenesis, crossovers oc-
cells populate the germ-line in the chimeric mice, the curred precisely flanking both sides of the targeted gene,
mutation will be transmitted when the chimera is mated. events whose combined probability is infinitesimally
A cross between a germline transmitting chimera and a small. (d) shows the probabilistic distribution of 129
C57BL/6 (B6) mouse (black chromosomes (a) will (white) vs. B6 (black) alleles in an F2 segregating
produce an F1 population (b) in which 50 % of the population. The depicted chromosomes thus represent
animals will have one copy of the mutant allele the genotype “average” of the F2 population. Note that in
(heterozygous mutants) and 50% of them will have no mice carrying the null mutation (chromosome ‘a’), the
mutant allele (wild type animals) at the targeted locus. probability of finding 129 alleles on the mutant chromo-
Using Southern blotting or PCR (polymerase chain some increases the closer a given locus is to the locus of
reaction) one can detect the presence of the mutant the targeted gene. However, in mice carrying the wild
allele and identify the heterozygous mutant animals. If type allele of the targeted gene (chromosome ‘b’), the
these animals are mated with each other, according to probability of finding 129 alleles decreases the closer a
Mendel’s law, homozygous mutant (two mutant alleles), given locus is to the locus of the targeted gene. Also note
heterozygous mutant (one mutant and one wild type that as the distance increases from the locus of the
allele) and wild type (two wild type alleles) animals will targeted gene, the probability of the presence of 129 vs.
be obtained. It is also important to remember, however, B6 allele approaches 50-50%. (Modified from (3))
Genetic Code 663

would be, on average, about 16 centiMorgans (cM)


representing about 1% of the genome, i.e. about 3–400 Genetic Code
genes. But if no alternatives are available, backcrossing
is recommended because it stabilizes the genetic
background of the mutant line by reducing the variation J EAN L EHMANN
in recombination patterns across generations. Other, Theoretical Biophysics, Department of Physics,
more complicated breeding strategies have also been Royal Institute of Technology, Stockholm, Sweden
proposed [reviewed in (3)]. They represent better jean@theophys.kth.se
solutions than simple backcrossing but they do not
completely eliminate the flanking allele problem.
Additional suggestions have also been made [for Definition
review see (3)]; rescue experiments may rule out the The genetic code is the set of correspondences between
potential effects of linked 129-type genes and genera- the codons of mRNAs and the amino acids of the
tion of “knock in” mice may be a control for the null proteins produced on the ribosomes through the
mutant gene “knock out” animals. Furthermore, ▶translation of these mRNAs. This set also includes
inducible knock out techniques, e.g. tetracycline “stop” codons without an amino-acid assignment, G
transactivator systems, are recommended because which specify the termination of translation.
comparison of pre- and post-induction phenotypes Codons are the units of information in mRNA. They
represents appropriate within subject control. Finally, are made up of three consecutive ▶nucleotides, each of
generating null mutant mice with a pure genetic which can be one of the four bases U, C, G or A. Sixty-
background is known to be the perfect solution for four different codons result from all the combinations.
the problems associated with the genetic background, Twenty amino acids are coded through the phenomen-
but up to date, this has not been preferred because most on of translation. A twenty-first, selenocysteine, is
ES cells are derived from 129 type mice and these coded in some organisms in particular situations (1, 2)
animals are not good breeders. and a still very rare twenty-second (pyrrolysine) has
recently been discovered (2).
The set of correspondences between the codons and the
Clinical Relevance amino acids is highly conserved in all organisms. The
Numerous genetic models of human diseases have been most common one is called the canonical genetic code,
and will be created. However, the genetic manipulation previously also called the universal genetic code before
is expected to lead to complex phenotypical changes deviations were found in different ▶taxa. It is usually
that are influenced by a large number of genes in the depicted in a table with three entries (Fig. 1).
genetic background as well as by environmental Since the sequences of mRNAs can be different from
factors. Understanding compensatory mechanisms those of the genes for proteins from which they are
and systemic responses to the induced mutation and transcribed, the coding rules are generally only valid
eliminating or addressing the confounding effects of for the mRNAs prior to translation. These differences
background genes are important steps forward that will arise from modifications on mRNAs through ▶matura-
facilitate the modeling and ultimately the understand- tion processes occurring before translation (observed
ing of human diseases. however almost only in Eukaryotes).

Characteristics
References Main Molecular Processes Involved in Decoding
1. Chen C, Kano M, Abeliovich A et al (1995) Impaired The coding rules are the outcome of extremely
motor coordination correlates with persistent multiple sophisticated molecular processes. Basically, the transla-
climbing fiber innervation in PKC gamma mutant mice. tion system converts genetic information contained in
Cell 83:1233–1242
genes into functional proteins. The information is always
2. Crusio WE (1996) Gene-targeting studies: new methods,
old problems. Trends Neurosc 19:186–187 processed in the form of an mRNA. In Eukaryotes, the
3. Gerlai R (2001) Gene targeting: Technical confounds and latter can be the outcome of a complex maturation process
potential solutions for the behavioral neuroscientist. from its primary transcript(s), which can involve several
Behav Brain Res 125:13–21 genes. In Eubacteria and Archebacteria, it is generally
4. Gerlai R (1996) Gene targeting studies of mammalian translated during or immediately after its ▶transcription
behavior: Is it the mutation or the background genotype? from a gene without any rearrangement. It may also come
Trends Neurosc 19:177–181
5. Gerlai R (1996) Gene targeting in Neuroscience: The from the genome of an invasive entity (virus).
systemic approach. Trends Neurosci 19:188–189 A start codon upstream of the mRNA signals the place
6. Lathe R (1996) Mice, gene targeting and behaviour: more of the beginning of translation. This codon is almost
than just genetic background. Trends Neurosci 19:183–186 always AUG, but is also sometimes GUG (very rarely
664 Genetic Code

3. During translation, occurring on the ribosomes, the


successive codons of a particular mRNA are tested
by the incoming tRNAs through anticodon-codon
associations. A proofreading mechanism is pre-
ceded by hydrolysis of the GTP of the complex at
the 3′ end of the tRNA. Anticodon-codon comple-
mentarity is the decisive factor leading to the
formation of the peptide bond between the last
amino acid of the nascent protein and the amino acid
carried by the incoming tRNA.

Accuracy of the Coding Rules


The reliability of the coding rules thus depends on the
accuracy of two major processes of molecular recogni-
tion:
a) Binding of both an amino acid and a tRNA by an
aaRS (steps 1–2).
b) Binding of the anticodon of a tRNA by a codon of
an mRNA through base pairing (step 3).
The first process (a) constitutes a textbook case of
RNA-protein recognition (3) and has been referred to
as the second genetic code; as soon as a particular aaRS
has bound an amino acid to a cognate tRNA, the code is
virtually already established since the anticodon-codon
associations occurring in (b) follow relatively strict
Genetic Code. Figure 1 The canonical genetic code base-pairing rules.
table. The codonic positions are arranged from left to
Each of the (generally) twenty different aaRSs thus
right of the table, corresponding to the reading direction
of the mRNAs (5′–3′). Each of the sixty-four possible
recognizes both a tRNA and an amino acid. The
codons code for a particular amino acid or the stop recognition elements on the tRNAs are mainly situated
function, which binds a release factor RF1 or RF2 at the in the acceptor stem and anticodon domains, which
end of translation. The amino acids are abbreviated to constitute the two major spatial regions of that molecule.
their three-letter notation. Both positive and negative elements are used by the
aaRSs to identify their associated tRNAs, and differ-
entiation between two tRNAs can sometimes depend on
the identity of only a few bases (3). Since more than one
tRNA molecule is often necessary for the translation of
UUG or CUG). Any of these codons can bind
the whole set of codons belonging to a particular amino
the initiator tRNA coding a modified methionine,
acid, in several cases an aaRS must be capable of
N-formylmethionine. The start codon also positions
recognizing different tRNAs, which implies that the
the sequence into the correct ▶reading frame. The
anticodon is sometimes not used as a discriminatory
process generally ends at a stop codon (UAA, UAG
element (e.g. for ser-, leu- and ala-accepting tRNAs).
or UGA in the canonical genetic code), which binds
The aaRSs are among the most studied enzymes, and
a release factor RF1 (UAA, UAG) or RF2 (UAA,
their structural analyses have given important clues on
UGA).
the evolution of the genetic system (4).
The path leading a free amino acid to incorporation into
The second process (b) involves an initial selection
a coded protein involves three main steps (Fig. 2):
and a proofreading mechanism that are used by
1. ▶Activation of the amino acid by the corresponding the ribosomes to discriminate between correct and
aminoacyl-tRNA synthetase (aaRS) through hydro- incorrect anticodon-codon associations (5). These
lysis of a molecule of ATP. coupled mechanisms, separated by GTP hydrolysis,
2. The aaRS subsequently binds the amino acid to a ensure a very low error frequency which in E. coli has
cognate tRNA. Prior to translation, a complex made been estimated as being between 6 × 10–4 and 5 × 10–3.
up of an ▶elongation factor Tu (EF-Tu) and GTP The initial selection step occurs when an incoming
binds the amino acid at the 3′ end of the tRNA, tRNA charged with its cognate amino acid (combined
giving it a much higher affinity for the ribosome. with EF-Tu and GTP) binds the ribosome and tests the
Genetic Code 665

Genetic Code. Figure 2 The main molecular processes involved in the assemblage of free amino acids into proteins
according to the rules of the genetic code. Three sets of molecules participate in the translation of the mRNAs: the
encoded amino acids (20 different kinds), the aminoacyl-tRNA synthetases (aaRSs) (at least one for each encoded
amino acid, not all shown here) and the tRNAs (at least 22 but often more than 30 different kinds, not all shown here).
The example shows the events leading to the translation of a CUU codon of an mRNA. First, a leuRS binds a leucine
and subsequently catalyses its activation through the hydrolysis of a molecule of ATP, resulting in an aminoacyl-
adenylate (leu  AMP) and ppi [reaction (1)]. As soon as the leuRS recognizes a cognate tRNA (in this case, a tRNA
with an anticodon GAG), the aminoacylation reaction occurs and the tRNA is thus charged with the amino acid [reaction
(2)]. The AMP molecule is then released. After the [EF-Tu + GTP] complex has bound the amino acid at the 3′ end of
the tRNA (not shown here), the latter can participate in translation, occurring on a ribosome. In accordance with the
wobble rules (Table 1), it can successfully bind the CUU codon, thereby enabling the formation of the peptide bond
between the incoming leucine and the nascent protein [reaction (3)]. The previous tRNA (with the anticodon AAG) then
leaves the ribosome. Note that the codons read from 5′ to 3′ (thus from right to left in the figure) and the anticodons from
3′ to 5′. The arrow above the mRNA shows the starting place of translation, while the cutting of the strand into codons is
indicated by small dots. For clarity, details on structures and events are omitted at the level of the ribosome.

codon at the point of being translated. At this stage, place. Before that event however, almost all remaining
most of the non-cognate tRNAs are rejected. However, near-cognate tRNAs are rejected (proofreading) (5).
in addition to cognate tRNAs, some near-cognate
tRNAs (whose anticodons can form partial base-pairs) Caveat
may also cross this first stage, associated with the The standard rules of the genetic code can be altered
hydrolysis of the GTP. This hydrolysis results in a during translation events that have been referred to as
▶conformational change of EF-Tu, which subse- “recoding” (1). These alterations are specific to
quently leaves the 3′ end of the tRNA. The amino acid individual mRNAs, and involve three possible kinds
is then free to move to the ▶peptidyl-transferase center of event, ▶programmed frame shifting, ▶translational
of the ribosome where peptide bond formation can take bypassing and ▶redefinition of codons. Albeit not
666 Genetic Code

common, it is thought that any of these events may


occur in all living organisms (1). They serve notably as
control mechanisms of gene expression.

Order in the Genetic Code Table


Degeneracy
Since most of the amino acids are coded by more than
one codon, the genetic code is degenerate (Fig. 3). The
degeneracy is connected with the base in third position
of the codons, which often has only a small influence
on the nature of the encoded amino acid. A particular
codon generally belongs to one of two major
categories, the four-fold and the two-fold degenerate
families. In two cases, the latter family is further split
into individual codons, each with a specific assign-
ment. The presence of a degeneracy symmetry over the
entire table, appearing when the succession [U, C, G,
A] (or its inverse) is used to write the table, shows that
the splitting into four-fold and two-fold degenerate
families is conditioned by the nature of the bases in the
first and second positions of the codons (Fig. 3).
The origin of the degeneracy stems from the existence
of wobble rules in the base-pairing of the third position
of the anticodons that can be specific to the type of
family (four-fold or two-fold degenerate), and which
are controlled by the ribosome (Table 1). Thus, in the
case of mitochondrial and chloroplastic systems where
wobble rules are the most extensive, a tRNA with U in
the third position of the anticodon can translate any of Genetic Code. Figure 3 Degeneracy in the canonical
the four codons of the corresponding four-fold genetic code table. Two families of degeneracy are
degenerate family, while a tRNA with U (G) in the mainly present in the table: the four-fold degenerate
third position of the anticodon can translate any of the families (gray rectangles) and the two-fold degenerate
two codons with G or A (U or C) in the third position of families (squares). In two cases (AUR and UGR), the
the anticodon of the corresponding two-fold degenerate latter type of degeneracy is further split into individual
family. As a result, only 22 different tRNAs are codons. The succession [U, C, G, A] used to write the
necessary in some systems (in which also the AGR table highlights a degeneracy symmetry over entire the
family is reassigned to stop; see next subsection) in table (dashed line), highlighting the fact that simple rules
order to read all the codons for amino acids. The based on the nature of the bases in the first and second
positions of the codon account for the distribution of the
wobble rules are, however, generally less extensive and
two main families in the table:
in most systems more than 30 kinds of tRNAs are used
for the translation of all the codons. Moreover, the
Four-fold degenerate Two-fold degenerate
tRNAs often have modified bases in the third position
families families
of the anticodon (e.g. I, Q), each with particular
matching properties (Table 1). 1st pos. 2nd pos. 1st pos. 2nd pos.
An important consequence of this wobble phenomen- (W) (Y,S) (W) (Y,W), (R,S), (R,W)
on, enabling a limited number of different tRNAs to
cope with the whole set of codons, is an improvement (S) (Y,W), (Y,S), (S) (R,W)
(R,S)
in the efficiency of translation.
Moreover, some amino acids are present in more than
one codon family. Thus, leu, ser and arg are coded by a where the following categories are used to differentiate
total of six codons in two codon families in the the bases: S (strong matching) = G or C; W (weak
canonical genetic code. matching) = A or U; R (purine) = A or G; Y (pyrimidine) =
C or U
Reassignments
Although the general organization of the genetic code
is conserved over all living organisms, many variants
Genetic Code 667

Genetic Code. Table 1 Wobble rules in the genetic code

3rd base 3rd base codon (3′)


anticodon (5′)
Standard Limitation Possible extension of Limitation known
matching(s) known so far matchings so far
Standard A U C,G,U(A) ACN and CGN
bases families in
Mycoplasma ssp,
yeast mt.,
nematode mt.
C G
G C,U
U A,G A,C,G,U All 4-fold degenerate
families G
in mt. and ch.
Modified I A,C,U 4-fold degenerate families (except GGN)
bases
Cm A(G) UUR family in E. coli
5
f C A,G AUR family in nematode, bovine, squid mt. and Drosophila mt.
L A AUA in eubacteria and plant mt.
Q C,U 2-fold degenerate families in eubacteria and eucaryotes
7
m G A,C,G,U AGN family in echinoderm mt. and squid mt.
5
xo U A,G,U UCN, GUN, GCN and ACN families in eubacteria
xm5Um, Um, A,G 2-fold degenerate families in mt., eubacteria and eucaryotes
xm5U
xm5s2U A(G) 2-fold degenerate families in eubacteria and eucaryotes
G in anticodon A,C,U AAU, AAC, AAA in echinoderm mt.
(UψG)

Main source: ref (6). N = U, C, G or A; R = A or G; Y = C or U. Mt. = mitochondria; ch. = chloroplasts

have been discovered in different taxa. In addition, the mitochondrial systems that also have UGA coding for
organelles (mitochondria and chloroplasts) have their trp. These changes imply that the wobble rules can be
own translation systems with specific variants of the altered within the codon families concerned.
canonical genetic code (Fig. 4). Furthermore, it has been observed that all codons that
An examination of all the variants discovered so far have been reassigned in the nuclear systems have also
indicates that some codons are more prone to been subject to reassignment in the mitochondria, while
reassignment than others (6). The stop codons no codon has been reassigned in the nuclear systems
constitute a typical example; the two-fold degenerate only (6).
family UAR is frequently reassigned to gln in the Some codon families seem to be excluded from any
nuclear system, while UGA almost always codes for trp reassignment procedure. This is especially the case for
in mitochondria. This particular versatility may be all codons with G in first position, corresponding to the
explained by the rarity of these codons, which implies amino acids val, ala, gly, asp and glu, as compared with
that their reassignments might not cause catastrophic the codons with U, C or A in this same position, for
disturbance. Another interesting change concerns the which at least four changes have been reported (Fig. 4).
AUA codon, which often codes for met in the Moreover, all these amino acids are only present in this
mitochondria. As a result, the degeneracy symmetry part of the table in the canonical genetic code. This
pointed out in Fig. 3 is entirely valid for the particularity shows that the genetic code has a core.
668 Genetic Code

are frequent. Many factors can affect their distribution,


but a general trend connected with the organization of
the genetic code, which is independent of the organism
or the genes under consideration can be highlighted.
The most general pattern occurring in coding se-
quences is GNN, meaning that the codons with G in
first position are overrepresented in comparison with
those with U, C or A at this same position (N stands for
any nucleotide).
These codons code for the amino acids val, ala, gly, asp
and glu, which display the simplest side chains of all
the amino acids of the genetic code. This can explain
their relative abundance in the proteins of which they
constitute the scaffold.
One can, furthermore, connect this abundance to the
observed stability of their assignment (see previous
section). Changing the signification of any of the
corresponding codons may cause malfunctions in some
proteins, resulting in disturbances in biochemical pro-
cesses (it has, however, been experimentally shown that
an organism such as E. coli can survive treatment of this
kind).
Interestingly enough, this set of amino acids generally
constitute more than 97% of the total quantity of all amino
acids found in the so-called prebiotic synthesis experi-
Genetic Code. Figure 4 Reassignments and para- ments (Fig. 4). It thus seems likely that the primitive code
meters of order in the canonical genetic code table. The was limited to the use of these amino acids, which were
numbers refer to reassignments that have been reported
only assigned to the codons with G in first position for
so far [main source: ref (6)]: 1 thr, 2 ser, 3 met and
unassigned, 4 stop, 5 trp and cys, 6 unassigned, 7 ser, gly, physico-chemical reason. Later, more codons and more
stop and unassigned, 8 ser, gly, stop and unassigned amino acids were added, eventually constituting the
(7 and 8 changes are independent), 9 leu, ala and gln, genetic code as we know it in present-day organisms.
10
tyr, gln and glu, 11 asn. Unassigned means that the
codon(s) disappeared in the entire genome. Changes A Hydrophobicity Parameter in the Table
that have been reported in mitochondria only are Correlation studies on physico-chemical properties of
indicated by an asterisk (*). The gray area within the the constituents of the genetic code have revealed that
table points out all codon families with G in first position, the ▶hydrophobicity of the base in the second position
for which any reassignment has been reported so far. of the anticodon is positively correlated with that of the
The corresponding amino acids generally constitute encoded amino acid. It has also been established that
more than 97% of the amount of all amino acids
the succession [A, G, C, U] ranks the bases from the
produced in experiments thought to reproduce the
conditions of the early Earth (the results however differ most hydrophobic (A) to the most hydrophilic (U).
slightly depending on the hypothesized conditions). The Thus, Fig. 4 naturally arranges the amino acids into the
hydrophobicity parameter is shown on the top of the table. genetic code table with regard to hydrophobicity; the
It is connected with the 2nd anticodonic position (over- most hydrophobic amino acids are mainly on the left of
drawn above the codonic one), the succession the table while the hydrophilic ones are on the right.
[A, G, C, U] ranking the bases in decreasing order on the This order in the genetic code table is noteworthy since
hydrophobicity scale. The hydrophobicity correlation the hydrophobicity of the amino acids has a major
between the base at this position and the amino acids influence on the folding of the proteins of which they
result in most of the hydrophobic amino acids being on the are the constituents. Thus, an analysis of coding
left of the table, while most of the hydrophilic ones are on
sequences based on this correlation can provide
the right.
information on the structure of the corresponding
encoded proteins (7).

The Core of the Genetic Code Clinical Relevance


As revealed by statistical analyses of coding sequences, Since the aaRSs are responsible for the reliability of the
all codons of the genetic code do not occur with equal coding rules, genetic diseases affecting these proteins
probability. Some codons occur very rarely while others can create important disorders in the organism.
Genetic Epidemiology 669

Autoantibodies to five aaRRs have been described, and alternatives for dealing with the risk of occurrence; (4)
each is associated with a syndrome of inflammatory choose the course of action which seems to them
myopathy with interstitial lung disease and arthritis (8). appropriate in view of their risk their family goals and
At the level of the ribosome, antibiotics such as their ethical and religious standards to act in accordance
paromomycin and streptomycin can significantly affect with that decision; and (5) to make the best possible
the fidelity of translation. Paromomycin stabilizes the adjustment to the disorder in an affected family
tRNAs irrespective of whether the codon-anticodon member and/or the risk of recurrence of that disorder.”
pair is cognate or near-cognate and hence increases ▶Fragile X Syndrome
amino acid misincorporation (5). ▶Peutz-Jeghers Syndrome
▶Nucleotide Biosynthesis
▶Translational Control in Eukaryotes

References Genetic Distance


1. Baranov PV, Gesteland RF, Atkins JF (2002) Recoding:
translational bifurcation in gene expression. Gene G
286:187–201 Definition
2. Ibba M, Söll D (2002) Genetic code: introducing Genetic distance describes the linear distance between
pyrrolysine. Curr Biol 12:R464–R466 genes on a genetic map, measured in Morgan (=100
3. Beuning PJ, Musier-Forsyth K (1999) Transfer RNA
recognition by aminoacyl-tRNA synthetases. Biopoly- centi-Morgan); a genetic distance of 1 centi-Morgan
mers (Nucleic Acids Sciences) 52:1–28 between two loci entails that they are expected to be
4. Ribas de Pouplana L, Schimmel P (2001) Aminoacyl- involved, on average, in one recombination per 100
tRNA synthetases: potential markers of genetic code meioses.
development. Trends Bioch Sci 26:591–596 ▶Genetic Epidemiology
5. Rodnina MV, Wintermeier W (2001) Ribosome fidelity:
tRNA discrimination, proofreading and induced fit.
Trends Bioch Sci 26:124–130
6. Knight RD, Freeland SJ, Landweber LF (2001) Rewriting
the keyboard: evolvability of the genetic code. Nat Rev
Gen 2:49–58 Genetic Epidemiology
7. Chiusano ML, Alvarez-Valin F, Di Giulio M et al (2000)
Second codon positions of genes and the secondary
structure of proteins. Relationships and implications for M ICHAEL K RAWCZAK
the origin of the genetic code. Gene 261:63–69 Institute for Medical Informatics und Statistics,
8. Hirakata M, Suwa A, Nagai S et al (1999) Anti-KS: Christian-Albrechts-University Kiel, Kiel, Germany
identification of autoantibodies to asparaginly-transfer
RNA synthetase associated with interstitial lung disease. krawczak@medinfo.uni-kiel.de
J Immun 162:2315–2320

Synonyms
Linkage analysis; positional cloning; gene mapping

Genetic Counselling Definition


Genetic epidemiology is the scientific discipline
concerned with the causes and distribution of human
Definition diseases in groups of relatives and with the inherited
The following definition of genetic counselling was predisposition to disease in populations. In many ways,
adopted by the American Society of Human Genetics: genetic epidemiology is an interdisciplinary science
“Genetic counselling is a communication process that has more aspects of population and evolutionary
which deals with the human problems associated with genetics to it than of classical epidemiology. Only
the occurrence or the risk of an occurrence of a genetic recently has the discipline changed its focus to target
disorder in the family. This process involves an attempt diseases that are characterized by the small-to-moder-
by one or more appropriately trained persons to help ate relative risks that are usually the objective of
the individual or family to (1) comprehend the medical conventional epidemiological research. So far how-
facts including the diagnosis probable course of the ever, genetic epidemiology has been most successful in
disorder and the available management; (2) appreciate elucidating the genetic basis of so-called “monogenic”
the way heredity contributes to the disorder and the risk disorders, including cystic fibrosis, Huntington disease
of recurrence in specified relatives; (3) understand the and Duchenne muscular dystrophy (Table 1). The
670 Genetic Epidemiology

Genetic Epidemiology. Table 1 A list of human disease genes characterized through genetic epidemiological
research employing a positional cloning approach

Year Gene Disease OMIM


1986 CYBB chronic granulomatous disease 306400
1986 DMD Duchenne muscular dystrophy 310200
1986 RB1 retinoblastoma 180200
1989 CFTR cystic fibrosis 219700
1990 CHM choroideremia 303100
1990 NF1 neurofibromatosis type 1 162200
1990 SRY sex reversal 306100
1990 WT1 Wilms tumour 194070
1991 APC familial adenomatous polyposis 175100
1991 FMR1 fragile X syndrome 309550
1991 KAL1 Kallmann syndrome 308700
1991 PAX6 aniridia 106210
1992 DMPK myotonic dystrophy 160900
1992 NDP Norrie disease 310600
1992 OCRL Lowe syndrome 309000
1993 ABCD1 adrenoleukodystrophy 300100
1993 ATP7A Menkes syndrome 309400
1993 ATP7B Wilson disease 277900
1993 BTK X-linked agammaglobulinemia 300300
1993 FMR2 fragile X syndrome 309548
1993 GK hyperglycerolemia 307030
1993 HD Huntington disease 143100
1993 NF2 neurofibromatosis type 2 101000
1993 PAFAH1B1 Miller-Dieker syndrome 247200
1993 SCA1 spinocerebellar ataxia type 1 164400
1993 TSC2 tuberous sclerosis 191092
1993 VHL von Hippel Lindau syndrome 193300
1994 BRCA1 hereditary breast and ovarian cancer 113705
1994 DAX1 X-linked adrenal hypoplasia congenita 300200
1994 DRPLA dentatorubral-pallidoluysian atrophy 125370
1994 EMD Emery-Dreifuss muscular dystrophy 310300
1994 FGD1 Aarskog-Scott syndrome 305400
1994 FGFR3 achondroplasia 100800
1994 MJD Machado-Joseph disease 109150
1994 PKD1 polycystic kidney disease type 1 173900
1994 SLC26A2 diastrophic dysplasia 222600
Genetic Epidemiology 671

Genetic Epidemiology. Table 1 A list of human disease genes characterized through genetic epidemiological
research employing a positional cloning approach (Continued)

Year Gene Disease OMIM


1994 WAS Wiskott-Aldrich syndrome 301000
1994 XK McLeod syndrome 314850
1995 ARSE X-linked recessive chondrodysplasia punctata 302950
1995 ATM ataxia telangiectasia 208900
1995 BLM Bloom syndrome 210900
1995 BRCA2 hereditary breast cancer 114480
1995 CAPN3 limb-girdle muscular dystrophy type 2A 253600
1995 CLN3 Batten disease 204200
1995 EXT1 multiple exostoses 133700 G
1995 OA1 ocular albinism 300500
1995 PHEX hypophosphatemia rickets 307800
1995 POU3F4 X-linked mixed deafness 304400
1995 PSEN1 early onset familial Alzheimer disease 104311
1995 PSEN2 early onset familial Alzheimer disease 600759
1995 SGCB limb-girdle muscular dystrophy type 2E 604286
1995 SGCG limb-girdle muscular dystrophy type 2C 253700
1995 SMN1 spinal muscular atrophy type 1 253300
1996 CHS1 Chediak-Higashi syndrome 214500
1996 CSTB progressive myoclonus epilepsy 254800
1996 ED1 ectodermal dysplasia type 1 305100
1996 EXT2 multiple exostoses type 2 133701
1996 FANCA Fanconi anaemia 227650
1996 FRDA Friedreich ataxia 229300
1996 GPC3 Simpson-Golabi-Behmel syndrome type 1 312870
1996 HFE haemochromatosis 235200
1996 HPS Hermansky-Pudlak syndrome 203300
1996 KCNQ1 long QT syndrome type 1 192500
1996 MTM1 X-linked myotubular myopathy type 1 310400
1996 PITX2 Rieger syndrome 180500
1996 PKD2 polycystic kidney disease type 2 173910
1996 PTCH Gorlin syndrome 109400
1996 RECQL2 Werner syndrome 277700
1996 RPGR X-linked retinitis pigmentosa 300389
1996 SGCD limb-girdle muscular dystrophy type 2F 601287
1996 TAZ Barth syndrome 302060
1996 TCF1 maturity-onset diabetes of the young type 3 600496
672 Genetic Epidemiology

Genetic Epidemiology. Table 1 A list of human disease genes characterized through genetic epidemiological
research employing a positional cloning approach (Continued)

Year Gene Disease OMIM


1996 TCOF1 Treacher-Collins syndrome 154500
1997 ABCA4 Stargardt disease type 1 248200
1997 AIRE autoimmune polyglandular syndrome type 1 240300
1997 DIAPH1 non-syndromic autosomal dominant deafness type 1 124900
1997 DYT1 early-onset torsion dystonia 128100
1997 JAG1 Alagille syndrome 118450
1997 MEFV familial Mediterranean fever 249100
1997 MEN1 multiple endocrine neoplasia type 1 131100
1997 MID1 Opitz syndrome 300000
1997 MYOC juvenile primary open angle glaucoma 137750
1997 NPC1 Niemann-Pick disease 257220
1997 RS1 retinoschisis 312700
1997 SCA7 spinocerebellar ataxia type 7 164500
1997 SLC26A4 Pendred syndrome 274600
1997 TBX5 Holt-Oram syndrome 142900
1997 TSC1 tuberous sclerosis 191100
1997 UBE3A Angelman syndrome 105830
1997 ZIC3 situs inversus 306955
1998 CACNA1F X-linked congenital night blindness type 2 300071
1998 CLN5 neuronal ceroid lipofuscinosis type 5 256731
1998 CTNS cystinosis 219800
1998 DCX X-linked lissencephaly 300067
1998 DFNA5 non-syndromic autosomal dominant deafness type 5 600994
1998 DYSF limb-girdle muscular dystrophy type 2B 253601
1998 EPM2A progressive myoclonus epilepsy (Lafora) 254780
1998 FCMD Fukuyama type congenital muscular dystrophy 253800
1998 KCNQ2 benign familial neonatal convulsions type 1 121200
1998 MYO15A non-syndromic sensorineural recessive deafness type 3 600316
1998 NBS1 Nijmegen breakage syndrome 251260
1998 NPHS1 nephrotic syndrome type 1 256300
1998 PABPN1 oculopharyngeal muscular dystrophy 164300
1998 PARK2 juvenile Parkinson disease 600116
1998 RP2 X-linked retinitis pigmentosa 312600
1998 SH2D1A X-linked lymphoproliferative disease 308240
1998 STK11 Peutz-Jeghers syndrome 175200
1998 USH2A Usher syndrome type 2A 276901
Genetic Epidemiology 673

Genetic Epidemiology. Table 1 A list of human disease genes characterized through genetic epidemiological
research employing a positional cloning approach (Continued)

Year Gene Disease OMIM


1998 VMD2 Best disease 153700
1998 WFS1 Wolfram syndrome 222300
1999 ABCA1 Tangier disease 205400
1999 ATP2A2 Darier disease 124200
1999 CCM1 cerebral cavernous malformations type 1 116860
1999 CLN8 progressive epilepsy with mental retardation 600143
1999 EFEMP1 malattia leventinese 126600
1999 LMNA Emery-Dreifuss muscular dystrophy 181350
1999 PRG4 camptodactyly-arthropathy-coxa vara-pericarditis syndrome 208250 G
1999 SLC17A5 Salla disease 604369
1999 SLC19A2 thiamine-responsive megaloblastic anaemia 249270
1999 SLC25A13 adult-onset citrullinemia type 2 603471
1999 SPG4 autosomal dominant hereditary spastic paraplegia type 4 182601
1999 WISP3 progressive pseudorheumatoid dysplasia 208230
2000 AIPL1 Leber congenital amaurosis type 4 604393
2000 ATP2C1 Hailey-Hailey disease 169600
2000 FGF23 autosomal dominant rickets 193100
2000 MCOLN1 mucolipidosis type 4 252650
2000 MKKS McKusick-Kaufman syndrome 236700
2000 MTMR2 Charcot-Marie-Tooth disease type 4B 601382
2000 MYH9 May-Hegglin anomaly 155100
2000 NPHS2 idiopathic steroid-resistant nephrosis 600995
2000 PRKAR1A Carney complex 160980
2000 PVRL1 cleft lip/palate-ectodermal dysplasia syndrome 225000
2000 RELN autosomal recessive lissencephaly 257320
2000 SPINK5 Netherton syndrome 256500
2000 USH1C Usher syndrome 1C 276904
2001 BBS2 Bardet-Biedl syndrome 209900
2001 BBS4 Bardet-Biedl syndrome 209900
2001 BSND Bartter syndrome 602522
2001 CDH23 Usher syndrome 1D 601067
2001 ELAC2 familial prostate cancer 176807
2001 FANCD2 Fanconi anaemia 227646
2001 PRPC8 retinitis pigmentosa 13 600059
2001 SOST sclerosteosis 269500
2001 USH3A Usher syndrome 3 276902
674 Genetic Epidemiology

Genetic Epidemiology. Table 1 A list of human disease genes characterized through genetic epidemiological
research employing a positional cloning approach (Continued)

Year Gene Disease OMIM


2002 BBS1 Bardet-Biedl syndrome 209900
2002 GDAP1 Charcot-Marie-Tooth disease type 4A 214400
2002 RNASEL hereditary prostate cancer type 1 601518
2002 SMARCAL1 Schimke immuno-osseous dysplasia 242900
2002 TMC1 non-syndromic autosomal dominant deafness type 36 606705
2002 TRPM6 hypomagnesemia with secondary hypocalcemia 602014

OMIM: Entry number in “Mendelian Inheritance in Man Online” at ▶http://www.ncbi.nlm.nih.gov/omim/

approach taken to detect the genetic variants respon- therefore has to be transformed into a linear “▶genetic
sible for these diseases is generally known as distance” d(θ) by some ▶mapping function (the most
“positional cloning”. Instead of relying upon prior widely used being d(θ) = ½ln(1–2θ), proposed by
knowledge about the disease-causing biochemical British geneticist J.B.S. Haldane in 1919). Genetic
defect(s), positional cloning utilizes the segregation distance is measured in units of Morgan (M) in order
pattern of genetic markers (e.g. SNPs, microsatellites, to honour T.H. Morgan, the American Nobel
RFLPs) in affected families to localize the genes prize-winning biologist who first discovered the role
involved in a given phenotypic trait (“linkage analy- of chromosomes in heredity. One centi-Morgan (cM)
sis”). The more often the trait and a particular marker roughly corresponds to one expected recombination per
allele are co-inherited by family members, the stronger 100 meioses.
the evidence that a gene in the vicinity of the marker
influences the trait, i.e. that marker and disease gene are
linked. Parametric Linkage Analysis
Linkage analysis has two major objectives, namely
Characteristics (i) to clarify with some statistical confidence whether
Gene Mapping and Meiotic ▶Recombination θ = ½ or θ < ½, and (ii) to estimate θ in the latter case.
Formally, linkage analysis involves the assessment of Both goals are easily achieved in laboratory animals
the recombination fraction θ between two genetic loci where controlled breeding can be performed such that,
like, in the context of genetic epidemiology, a marker after a few generations, recombinants and non-
and an unknown disease gene. Parameter θ equals the recombinants can simply be counted. In humans as
rate at which children receive from a given parent either well as in animals with longer generation times
the grand-maternal allele at one locus and the grand- however, linkage analysis has to fall back upon family
paternal allele at the other or vice versa. Assuming data. How such data can be used to draw statistical
Mendelian inheritance, θ = ½ for a pair of genes located inferences about linkage depends upon their complex-
on different chromosomes. For two loci residing on the ity, i.e. on how much prior knowledge is available
same chromosome, meiotic recombination is only about the genetic, environmental and stochastic nature
possible via “▶crossing-over”, signifying the breakage of the phenotypes of interest.
and re-union of homologous, non-sister chromatids For the simple genotype-phenotype relationships
during the metaphase of meiotic division I (Fig. 1). encountered with most monogenic disorders, family
Indeed, it can be shown mathematically that θ equals data can be analysed by explicitly modelling the co-
exactly half the probability of at least one crossing-over inheritance of the disease and marker in a family, based
occurring between the two loci in question, provided upon the underlying genotype frequencies and ▶pene-
that some critical assumptions about the randomness of trances (“parametric linkage analysis”). In such cases,
crossing-overs are correct. the likelihood L of a given recombination fraction θ0
In any case, one corollary of the above is that θ between disease gene and marker is a computable
represents an increasing function of the physical function of the phenotypic data D observed in the
distance between two loci and therefore provides a family. This leads to the definition of zðy0 Þ ¼ log10
key parameter for gene mapping. Unfortunately, θ is fLðy ¼ y0 jDÞ=Lðy ¼ 1=2jDÞg.
not an additive measure of distance since it can never Quantity z, termed the “▶lod score” and introduced by
exceed ½. In order to facilitate gene mapping, θ N.E. Morton in 1955, is used as a sequential statistic to
Genetic Epidemiology 675

Genetic Epidemiology. Figure 1 Process of crossing-over during germ cell development. The two nearly
duplicated chromosomes align during the late metaphase of meiotic division I, where an overlap and breakage
of their constituent non-sister chromatids may occur (red: maternal chromosome, blue: paternal chromosome).
Re-annealing and re-synthesis by the cellular repair mechanisms leaves two chromatids with genetic material
flanking the site of crossing-over that is not of the same parental origin (i.e. the resulting chromosomes would
represent recombinants with respect to the two loci shown).

test whether θ < ½, and to quantify the evidence in . relatively minor effects exerted by individual
favour or against θ0. When z(θ0) > 3, then linkage variants and
between disease gene and marker locus is regarded as . an important modifying role of environmental
being proven and θ is estimated by that recombination factors.
fraction that yields the highest lod score. This
Since no prior information is usually available as to
procedure is exemplified in Fig. 2 for a large family
how genetic variation at a given locus modifies the risk
affected by an autosomal dominant disorder. The
for a complex disease (i.e. the genetic model of the
results of a linkage analysis are usually presented in
disease is unknown), gene mapping for complex
the form of lod score tables or graphs (Fig. 2b) where
diseases has to adopt robust, albeit less powerful,
ceteris paribus, studies of independent (i.e. unrelated)
“non-parametric” or “model-free” linkage analysis
families can be aggregated by summation of the family-
such as, for example, the study of pairs of relatives.
wise lod scores. If a disease gene is to be integrated into
The idea underlying this approach, which is both
a pre-existing map of linked markers, then this is most
simple and intuitively appealing, goes back to a 1935
efficiently performed by parametric multi-locus linkage
paper by British medical geneticist L.S. Penrose. The
analysis, which has been shown to be up to twice as
number of sib-pairs, out of a total of n independent
accurate as the pair-wise approach, measured in terms
pairs, who share k parental alleles of an autosomal
of the variance of the ensuing recombination fraction
locus identical by descent (“ibd”) follows a multi-
estimates.
nomial distribution with parameters zk, k = 0, 1, or 2
(Fig. 3). Under the null hypothesis of no etiological
Non-Parametric Linkage Analysis of Complex Diseases
connection between marker and disease, the inheritance
Increasingly, the major challenge to genetic epidemiol-
of a marker can be assumed to follow Mendelian rules
ogy is being posed by so-called “complex” diseases,
and to be independent of the disease status of the
which comprise conditions such as diabetes, heart
siblings. This implies that, irrespective of whether the
disease, cancer and psychiatric illness. When compared
sibs are concordant or discordant, z0 = ¼, z1 = ½ and
to monogenic disorders, this category of disease is
z2 = ¼. Any test for a deviation from these proportions
characterized by
represents a test for linkage between the marker
. a substantially higher population frequency, and a putative disease gene. In its original form, the
. the involvement of multiple genes, most probably affected sib-pair test required that the ibd status at each
interacting with one another, marker be determined unequivocally for all sib-pairs.
676 Genetic Epidemiology

Genetic Epidemiology. Figure 2 Linkage analysis of a family affected by an autosomal dominant disease.
(a) shows the pedigree with patients marked by black symbols. Genotypes observed for a biallelic marker locus
with alleles “A” and “B” are displayed alongside each individual. (b) is a graphical display of the lod scores as
calculated from the family data, assuming full penetrance and lack of de novo mutation. The lod score at θ0=0.00
equals minus infinity owing to a recombination that has occurred during the paternal meiosis leading to an
affected girl in the most recent generation (marked by *).

However, a number of derivatives of the test have since Quantitative Phenotypes


been developed which incorporate posterior distribu- Non-parametric methods have also been devised for the
tions on k (as inferred from other relatives), utilize genetic analysis of quantitative phenotypes which, in
relatives other than sibs or are based upon identity-by- many instances, may be more powerful than the
state (“ibs”) rather than ibd allele sharing. In general, consideration of dichotomized disease outcomes (“af-
these methods are less powerful than the original test, fected” vs “non-affected”) derived from them. The idea
but can extract mapping information that would underlying the concept of quantitative trait mapping is
otherwise not be used. nevertheless the same as for qualitative characters in
Genetic Epidemiology 677

Genetic Epidemiology. Figure 3 Level k of autosomal identical-by-descent (ibd) allele sharing between two sibs.
For each value of k (i.e. k=0,1, or 2), shared alleles are marked in red for the second sib.

Genetic Epidemiology. Table 2 Haplotype linked to a gene of strong effect, the phenotypic G
frequencies of two biallelic loci covariance among relatives should be positively related
to their degree of marker ibd sharing, and this
Marker 2 Marker 1 Total relationship translates into larger estimates of the
allele 1A allele 1B corresponding variance components.
allele 2A a b a+b
▶Linkage Disequilibrium
allele 2B c d c+d Intervals of 1 Mb are generally regarded as the limit of
Total a+c b+d 1 mapping resolution that can be achieved using (family-
based) linkage analysis, and more precise mapping of
Linkage disequlibrium (LD) is usually measured by D=ad-bc.
disease genes can only be expected from population-
However, since D depends upon the marginal sums (i.e. the based association studies, exploiting linkage disequili-
allele frequencies), LD is often quantified by D'=D/Dmax instead. brium (LD). The most general definition of LD is the
Here, Dmax denotes the maximum absolute value of D that is condition that the alleles of linked loci do not occur
possible for the same allele frequencies. If D>0, then Dmax=min statistically independent on the chromosomes observed
{(a+c)(c+d),(a+b)(b+d)}; if D<0, then Dmax=min{(a+b)(a+c), in a population. In the simplest case of two biallelic
(b+d)(c+d)}. loci, haplotype frequencies can be arranged in a
two-by-two table with cell probabilities a,b,c and d
(Table 2), and LD is meaningfully quantified by the
that the observed level of marker ibd sharing between cross product D = ad − bc. When a new allele first
relatives is compared to their phenotypic similarity. If a arises in a population by either mutation or migration, it
marker is linked to a gene that influences the trait, then occurs as a single copy that resides on a certain
sibs with similar phenotypes, for example, will tend to haplotype background, together with certain alleles of
share more than ½ of their marker alleles ibd whereas other loci. Only in later generations will the allele
dissimilar sibs will not. A formal test for this effect was become more frequent through either selection or
proposed by American statisticians J.K. Haseman and genetic drift or both. In any case, chromosomes
R.C. Elston in 1972. For each sib-pair, the squared carrying the new allele will recombine with chromo-
difference Y of their phenotypic values is calculated, somes carrying other haplotypes so that the strong
and the number π of their ibd marker alleles determined original LD will erode with time. This loss of LD will
(or estimated). A linear regression analysis of Y on π be slower for closely linked loci and, under some
then reveals (i) whether the two variables are correlated simplifying assumptions about mutation rates, migra-
and (ii) whether a significant relationship, if found, tion and mating patterns, D can indeed be shown to
is a biologically plausible indication of linkage. The decrease by a factor of 1 − θ in each generation.
Haseman-Elston approach has since been expanded Therefore, strong LD is an indication of close linkage
and refined, for example by considering the mean- and assessment of LD between a marker and putative
corrected product of the sib-specific phenotypes disease gene can be regarded as linkage analysis in a
instead of Y, so as to increase informativity about super-pedigree tying all analysed individuals together.
linkage. An alternative method aims at decomposing In principle, any kind of genetic marker can be
the variance of the phenotype into components that are employed in disease association studies provided that
due to genes which are either linked to the marker, or (i) the marker mutation rate is low and (ii) the density of
not (“variance component analysis”). For a marker markers chosen for analysis is high enough to ensure
678 Genetic Epidemiology

Genetic Epidemiology. Figure 4 Transmission disequilibrium test (TDT) of disease-association for biallelic marker
genes. The sampling units of the TDT are nuclear families comprising both parents and an affected child (“trios”).
Only transmissions from heterozygous parents to their children are evaluated (cells x and y in the table shown). The
TDT statistic equals McNemar’s (x-y)2/(x+y), which follows a χ2 distribution with 1 degree of freedom under the null
hypothesis of no association.

sufficiently strong LD with disease gene(s). Empirical estimated from the respective frequencies in patients
data and theoretical considerations suggest that a and unrelated healthy controls. However, concerns
sensible marker density should be no lower than have arisen over the potentially confounding effects of
approximately 1 in 50,000 nucleotides. Ideally, asso- ethnic, social or geographical population stratification
ciation markers should be chosen from within genes that would generate systematic differences between the
that represent biologically plausible candidates for an genetic characteristics of the two samples, unrelated to
involvement in the disease of interest. The chance of disease. To solve this problem, family-based associa-
detecting association would then be increased further if tion designs have been proposed of which the
marker alleles were themselves of functional signifi- transmission disequilibrium test (TDT) is the most
cance by altering, for example, the protein product or a widely used. A TDT is basically a McNemar test for
regulatory sequence. preferential transmission of particular marker alleles
from heterozygous parents to their affected offspring
Family-based Association Studies (Fig. 4). Any deviation of the transmission to non-
The simplest form of a population-based association transmission ratio from the expected 1:1 is indicative
study is that invoking a case-control design. As in of both linkage between marker and disease gene
classical epidemiology, relative risks or odds ratio of in the presence of LD and of LD in the presence of
particular marker genotypes or haplotypes can be linkage.
Genetic Heterogeneity 679

Since chromosomes of close relatives act as internal is a constantly evolving scientific discipline so that
controls in the TDT and similar tests, it has become improved power may also arise from the development
almost paradigmatic for the genetic epidemiology of and application of new analytical tools that take genetic
complex disease that family-based association studies and etiological heterogeneity into account (e.g. multi-
are superior to case-control designs. However, family- locus statistics, time series analysis).
based designs also have disadvantages that might not Complex human diseases are typically common and
always be fully outweighed by their apparent robust- have a substantial economic impact upon national health
ness. For example, gene-environment interactions systems. The resulting public interest renders disorders
cannot be analysed in family-based studies since no such as cancer, heart disease and diabetes particularly
genuine controls are available for comparison to attractive for genetic research and a large number of
patients. Furthermore, parental genotypes are required studies into these diseases are often being performed in
for the TDT and these may be difficult to obtain for parallel. On the other hand, for the reasons mentioned
late-onset disorders. The use of other family members above, the prior probability of successfully mapping and
as surrogates to try to reconstruct parental genotypes characterizing genes for complex diseases is compara-
with some certainty has been suggested in such tively low and usually unknown. Even with high
instances. However, such methods are usually costly, significance levels imposed, most positive gene map- G
inaccurate and inefficient. On the other hand, possible ping results may therefore be wrong. This implies that
confounding of data by population stratification can genetic epidemiological research will almost inevitably
be avoided in case-control studies through careful continue to be driven towards the generation of false
matching by ethnic and geographic origin. Further- positive results. Claims as to the elucidation of a disease
more, if a sufficiently large set of genetic markers is predisposition should therefore always be received with
available that is not itself tested for association, then some caution and judged as preliminary until confirmed
these markers can be used to estimate the level of by controlled replication, meta-analysis or independent
population stratification and to correct the employed laboratory experiments.
test statistic accordingly. Finally, although population
stratification may represent a theoretical possibility, References
empirical evidence for its practical importance as a 1. Elston RC (1998) Linkage and association. Genet
confounding factor in genetic epidemiology is still Epidemiol 15:565–576
lacking. 2. Schork NJ, Cardon LR, Xu X (1998) The future of genetic
epidemiology. Trends Genet 14:266–272
Clinical Relevance 3. Risch NJ (2000) Searching for genetic determinants in the
The reasons for the apparent lack of success that plagues new millennium. Nature 405:847–856
4. Terwilliger JD, Göring HHH (2000) Gene mapping in the
genetic studies of complex disease are manifold and the
20th and 21st centuries: statistical methods, data analysis,
most critical issue is probably the reduction in power and experimental design. Hum Biol 72:63–132
caused by genetic heterogeneity. Genetic heterogeneity 5. Sham P (2001) Shifting paradigms in gene-mapping
can occur at two levels, either within genes (“allelic methodology for complex traits. Pharmacogenomics
heterogeneity”) or between genes (“locus heterogene- 2:195–202
ity”). In order to increase the power of genetic studies of
complex diseases, the major goal in their planning and
performance is thus to control for genetic heterogeneity
at all levels. First and foremost, this requires a careful Genetic Hearing Disorder
choice of the population under study. Ideally for a
disease association to be detectable, all copies of the
predisposing allele should be ibd in patients. The best Definition
populations to analyse for LD are therefore those that Genetic hearing disorders mainly comprise of non-X
have been small and isolated for most of their history or linked genetic hearing loss, which is only observed in
that have undergone recent expansion from a small homozygous individuals having inherited the mutated
number of founders. Not surprisingly, genetic epide- gene from each parent.
miology has been particularly successful in Finland and ▶Microvilli
some inward breeding communities in the USA (e.g.
Amish, Hutterites, Ashkenazim). In addition to popula-
tion genetic issues, an appropriate definition of
phenotypes, a breakdown by sub-phenotypes and the Genetic Heterogeneity
use of sensible covariates to define etiologically homo-
geneous sub-populations can help to reduce genetic
heterogeneity further. Finally, genetic epidemiology ▶Heterogeneity/Heterogenous
680 Genetic Immunization

Genetic Immunization Genetic Polymorphism

Definition Definition
Genetic immunization is a technique to induce specific Genetic polymorphism is the presence of multiple
immune responses by injecting antigen-encoding inherited forms of a gene with at least an allele
expression plasmid DNA. frequency of 1% within the population.
▶DNA-based Vaccination ▶Pharmacogenomics

Genetic Interactions
Genetic Predisposition
to Multiple Sclerosis
Definition
Genetic interactions describe interactions between two
or more mutations that result in a phenotype. A LASTAIR C OMPSTON
▶Cell Polarity University of Cambridge Neurology Unit,
Cambridge, UK
alastair.compston@medschl.cam.ac.uk

Genetic Map Definition


▶Multiple sclerosis is a typical complex trait. There is
an increased familial recurrence risk and evidence from
Definition
▶linkage and ▶association studies for functional
Genetic map (also known as a linkage map) is a map of
▶polymorphisms that increase disease susceptibility
a genome, which shows the relative positions (order
but without Mendelian patterns of inheritance. There is
and distances) of the genes and/or markers on the
epidemiological evidence for the role of environmental
chromosomes. The map is based on pairwise coin-
factors in determining the distribution of the disease.
heritance (linkage) of markers. Genetic maps are
generally composites of data from many experiments.
▶Chromosome 21, Disorders Characteristics
▶YAC and PAC Maps Familial Multiple Sclerosis
Multiple sclerosis has a familial recurrence rate of
approximately 15%. Overall, the reduction in risk
changes from 3% (relative risk 9) in first-degree
relatives to 1% (relative risk 3.4) and 1% (relative risk
Genetic Modification 2.9) in second and third degree relatives, respectively,
compared with a population lifetime rate of 0.3%. At
one extreme, the risk of multiple sclerosis is 35% for
Definition the monozygotic twin partner of an affected proband, or
Genetic modification describes the introduction of a the children of parents who are both affected (▶con-
new nucleic acid into a cell, organism or micro- jugal pairs), compared to 0.3% for individuals related
organism. In the context of clinical gene transfer, the by adoption to a proband. Familial clustering is
term is used in conjunction with the transfer of a nucleic therefore genetically determined.
acid. This can be achieved by either introducing an
expression construct or by modification of cellular Cellular and Molecular Recognition
nucleic acid e.g. by revision of a point mutation. Nucleic The Analysis of Complex Traits
acids used (e.g. for triple helix formation or RNA with Two methods—linkage and association—underpin the
ribozyme function that is not part of a transgene) are not analysis of complex traits. Linkage has low statistical
ment to lead to genetic modification as far as the term is power but operates within families across relatively
used in the context of clinical gene transfer. large genetic distances. Association has high statistical
▶Clinical Gene Transfer power but is only informative within the boundaries of
Genetic Predisposition to Multiple Sclerosis 681

▶linkage disequilibrium present in the population molecules, immune receptors, accessory molecules,
under study. In founder populations, polymorphisms cytokines, chemokines and their receptors or antagonists,
that increase susceptibility to disease are necessarily structural genes of oligodendrocytes or myelin, and
located within a large group of linked genes. This block molecules regulating cell death and survival. In a small
is subject to recombination during subsequent meioses and regionally restricted population of Finns, multiple
and is gradually whittled down until there is no residual sclerosis is associated and linked to the gene for myelin
linkage disequilibrium. It seems that genetic factors basic protein, encoded on chromosome 18. The effect can
determine susceptibility and, to some extent, also shape be traced to a subset of families with common ancestry
the clinical course. The choice of markers has been and does not hold up in the larger cohort. It is the nature of
driven either by a priori guesses on the nature of screening so many potential effects that a proportion will
susceptibility (▶candidate genes) or systematic screen- appear to be associated or linked but by chance. Equally,
ing of the genome. Candidates selected because they it would be difficult to show unambiguously that one or
map within linked regions combine both strategies. more genes exerting a small biological effect is making a
Markers are analyzed individually (single point contribution to susceptibility in a single study. That said,
analysis) or corrected for information available from some plausible associations or linkages have been
their neighbours (multipoint analysis). The cumulative provisionally reported, although no one of these has yet G
probability that a given marker or region of interest is stood up to repeated replication. These mainly involve
linked or associated with multiple sclerosis can be factors that appear to increase susceptibility to multiple
formally tested by ▶meta-analysis of available studies. sclerosis, but there is provisional evidence for primary
The lesson learned thus far is that no one gene makes a effects on resistance and effects on severity or clinical
major contribution to susceptibility although collec- features of the disease.
tively they determine a relative risk (for siblings) of DR15 is associated with younger age at diagnosis and
around 20. female gender but does not distinguish features relating
to disease course, outcome, specific clinical features or
Candidate Genes in Multiple Sclerosis paraclinical investigations. This suggests that DR15
Much effort has gone into the assessment of candidate exerts its effect on susceptibility rather than modifying
susceptibility genes chosen on the basis of prevailing the course of multiple sclerosis. Loci apparently
ideas concerning the pathogenesis of multiple sclerosis. associated with disease protection are FAS-670, IL-
Population studies comparing unrelated cases and 12p40, FcR and MCP-3. Genes that may influence the
controls show an association between the class II course or phenotype of multiple sclerosis include
▶major histocompatibility complex alleles DR15 and CTLA4, IL-1Ra/IL-1B, IL-2, CCR5, oestrogen recep-
DQ6 and their underlying genotypes (DRB1*1501, tor, CNTF and Apo-E and mutations of mitochondrial
DRB5*0101 and DQA1*0102, DQB2*0602). This is DNA.
seen in almost all populations (Caucasian, Oriental,
Arab, Hispanic, Finnish, Russian and Jewish) although Linkage Genome Screens
the strength of the association differs. Even those ethnic The dividend from attempting to fast-track the solution
groups in which the frequency of multiple sclerosis is to susceptibility in multiple sclerosis by the candidate
low, or the phenotype distinct from that usually gene approach has been small but the problem is also
observed in northern Europe, are now acknowledged not solved by the nine whole genome linkage analyses
to be primarily DR15 associated with one exception. In using variable numbers of families from the United
Sardinians, the association is with DR4 (DRB1*0405- States, Canada, United Kingdom, Finland, Sardinia,
DQA*0301-DQB1*0302). In some other Mediterranean Italy, Turkey, Scandinavia and Australia. These screens
populations (Canaries and Turkey), the association is have involved between 21–225 families each typed for
with DR2 (DBR1*1501, DQA1*0102, DQB1*0602) 257–443 microsatellite markers chosen to provide an
and DR4 (DRB1*04, DQA1*03, DQB1*0302). Most average spacing of around 10 centiMorgans. Although
investigators assume that—based on the genetics several new genomic regions of interest were revealed,
and obvious candidature through its role in restricting many are false positives. These whole genome screens
the immune response—DR (or DQ) is itself the have been used to explore regions of interest in more
susceptibility gene encoded at 6p21. detail hoping to consolidate their status based on
Outside the major histocompatibility complex, many mapping but without picking out positional candidates.
candidates have been screened. The case made on the Linkage on chromosome 17q is supported by addi-
basis of ideas concerning the pathogenesis of multiple tional positional screens from Denmark, Canada, and
sclerosis is much strengthened by prior knowledge that Finland. There is collateral support for the involvement
the candidate gene also maps to a region already of chromosomes 5p, 7p and 12q based on direct
implicated by linkage studies (positional candidates). evidence and synteny with genes determining sus-
The range of candidates now studied includes adhesion ceptibility to experimental forms of demyelination.
682 Genetic Predisposition to Multiple Sclerosis

Meta-analysis has been deployed in the expectation provisional site already offers several interesting
that this will reduce the evidence for false positive possibilities, although the number of genes encoding
peaks and strengthen the candidature of those which are components of the nervous, immune and signalling
genuine providing the best guide to shared regions of systems is such as to make practically any region
interest as the map is serially up-dated. This was last suggestive with respect to sensible candidates. Rapid
completed in 2005. progress is being made in characterising the whole
genome for the size, distribution and diversity of blocks
Whole Genome Association Screening containing a restricted number of haplotypes. If the
Until recently, whole genome linkage disequilibrium preliminary evidence holds up, it will be possible to tag
mapping was considered impractical and dependent on the common variants within each block in populations
chance co-localisation of susceptibility genes and (such as Europeans) retaining significant linkage
markers applied randomly and at low density. This disequilibrium and screen individuals for the suscept-
situation changed with the increased availability of ibility haplotypes with relative economy.
widely distributed microsatellite markers and is set to
increase further with the identification and mapping of Clinical Relevance
▶single nucleotide polymorphisms. A first pass at ▶Concordance within families can be used to gauge
screening the genome for association was completed in the influence of genetic factors in determining the
2003 based on a 0.5 cM map of microsatellite markers clinical phenotype of multiple sclerosis. Time to reach
and using DNA pools derived from cases with multiple the later stages of disability does not differ between
sclerosis and unrelated controls. Individual results familial and sporadic cases. Conjugal pairs show no
provided provisional evidence for associations based evidence for clinical concordance, clustering at year of
on linkage disequilibrium outside the major histo- onset or distortion of the expected pattern of age at
compatibility complex on 6p21. The number of onset in the second affected spouse. The most recent
micro-satellite markers used necessarily made this a assessment of concordance in co-affected siblings and
low-density screen, especially since the number of parent-child pairs supports a role for genetic factors in
informative markers was less than the full set of 6000 determining age at onset and progression either from
used in this Genetic Analysis of Multiple sclerosis in onset or after a phase of relapsing remitting disease, but
EuropeanS (GAMES). With considerable variation not the initial presentation or disability. Concordant
depending on the stochastic nature of linkage dis- parent-child pairs show no distortion in the random
equilibrium on individual chromosomes in European distribution of male-female pairings and neither sex nor
populations, it may only have covered 10% of the line of inheritance influence disability, age at onset or
genome in detail and another 20% in part, leaving course. In this situation, disability is highest in the male
much yet to be explored. Perhaps its main value lies in offspring of affected fathers, who more commonly
the exclusion of many microsatellite markers lying in follow a primary progressive course.
blocks of linkage disequilibrium of varying size rather The risk of ▶autoimmunity is increased in the relatives
than in the provisional positive associations. New of probands with multiple sclerosis. Three surveys,
screens based on single nucleotide markers present on together involving around 4000 relatives of 1000
individual chips, and at a much higher density are now probands, have shown recurrence of multiple sclerosis
in progress. in 15% with another autoimmune disease (Graves’
disease, rheumatoid arthritis and diabetes) in about 5%
Future Strategies for Identifying Susceptibility Genes of pedigrees. Several other disorders have been
Once regions of interest are mapped, the next aim is to considered more frequent than expected in patients
move from whole genome screening to the identifica- with multiple sclerosis. None of these is entirely
tion of functional polymorphisms which condition one secure but the there may be co-morbidity between
component or another of the disease process and neurofibromatosis 1 and primary progressive multiple
determine variations in the clinical course and features. sclerosis.
How to reach that position is less clear and several A minority of patients who meet clinical criteria for the
parallel strategies have been suggested. One is to add diagnosis of multiple sclerosis and in whom there are
incrementally to the number of available famil``õõies associated magnetic resonance imaging abnormalities
until thresholds for linkage are reached for the and cerebrospinal fluid oligoclonal bands have an
identification of secure loci using statistical criteria illness in which there is disproportionate involvement
for genome wide significances. An alternative is to of the anterior visual pathway. These are commonly
accept that the combination of linkage and association women with male relatives already known to be
now available is sufficient to concentrate the search for affected by ▶Leber’s hereditary optic neuropathy and
positional candidates within regions of interest. Each they have pathological mutations of mitochondrial
Genetic Screen 683

DNA. The clinical features of demyelinating disease References


seen in Orientals and Africans are distinct and provide 1. Ebers GC, Bulman DE, Sadovnick AD et al (1986) A
another example of clinical heterogeneity. In Japan, population based study of multiple sclerosis in twins.
multiple sclerosis shows either a Western phenotype, in N Engl J Med 315:1638–1642
which a number of sites are involved, or an optico- 2. Gabriel SB, Schaffner SF, Nguyen H et al (2002) The
structure of haplotype blocks in the human genome.
spinal pattern in which the clinical picture is dominated Science 296:2225–2229
by involvement of visual and spinal cord pathways 3. Harding AE, Sweeney MG, Brockington M et al (1992)
with a specifically different genetic background Occurrence of a multiple sclerosis-like illness in women
(HLA-DP*1501 rather than DRB1*1501 seen with who have a Leber’s hereditary optic neuropathy mito-
the Western phenotype). However, recent reports of chondrial DNA mutation. Brain 115:989–989
multiple sclerosis in Japanese highlight the pre- 4. Risch NJ (2000) Searching for genetic determinants in the
viously under-reported extent of the so-called Western new millennium. Nature 405:847–856
5. Sawcer S, Maranian M, Setakis E et al (2002) A whole
phenotype. Demyelinating disease is considered ex- genome screen for linkage disequilibrium in multiple
tremely rare in Africans, but a number of cases are sclerosis confirms disease associations with regions
described and the phenotype is typically a severe illness previously linked to susceptibility. Brain 125:1337–1347
dominated by one or more episodes usually affecting 6. The Transatlantic Multiple Sclerosis Genetics Coopera- G
the anterior visual pathway and spinal cord—again tive (2001) A meta-analysis of genome screens in multiple
combining the anatomical features of ▶Devic’s disease sclerosis. Mult Scler 7:3–11
with the clinical course of moderately severe relapsing
remitting multiple sclerosis. ▶Phenocopies may con-
fuse the analysis of complex traits where diagnosis
depends on pattern recognition of symptoms, signs and
laboratory investigations in the absence of a test for the
disease. Reassuringly, one large cohort screened for
other diseases was shown not to be contaminated by Genetic Redundancy
cases of ▶CADASIL, ▶spinocerebellar degeneration,
or ▶adrenoleukodystrophy.
Definition
Conclusions Genetic redundancy refers to the presence of genes in
Six main categories of susceptibility genes can be multiple forms in the DNA of eukaryotes. Two or more
predicted: genes which determine susceptibility to the genes are capable of executing the same tasks, thus
process of inflammation across a range of disorders – eliminating one gene’s function. It does not alter the
the autoimmune genes; those which determine the development of function.
specificity of that process for the development of ▶Drosophila Model of Cardiac Disease
multiple sclerosis – the ubiquitous genes; those which ▶Muscle Development
are relevant for the pathogenesis in isolated populations
– the domestic genes; those which determine particular
phenotypes – the pleiotropic genes; those which
determine variations in the clinical course – the
modifying genes and those which cluster to provide
specifically different (heterogeneous) contributions to
the pathogenesis – the epistatic genes. A major part Genetic Screen
of future studies will be to resolve the question of
disease heterogeneity in multiple sclerosis. When
eventually in place, the potential of this genetic Definition
knowledge for improved understanding of the patho- Genetic screening is analysing a group of individuals to
genesis of multiple sclerosis and designing novel identify those who are at high risk of having or passing-
treatments is considerable. Resolving the issues of on a specific genetic disorder. In experimental genetics,
complexity and heterogeneity in multiple sclerosis and the term describes an approach to identifying genes and
other complex traits has practical dividends. Without gene functions by random gene knock out and
knowledge linking aetiology to pathogenesis and identifying genes causing a specific phenotype; or vice
phenotype, putative new treatments will continue to versa, identifying phenotypes caused by specific gene
be screened in cohorts who may or may not have an knock out.
appropriate pathological substrate for that particular ▶Genetic Screening in Populations
intervention. ▶Mutagenesis Approaches in the Zebrafish
684 Genetic Screening in Populations

8. there should be adequate health service provision


Genetic Screening in Populations for the extra clinical workload resulting from the
screening
9. the risk of screening, both physical and psycholo-
V ILMUNDUR G UDNASON gical should be less than the benefits
Icelandic Heart Association Heart Preventive Clinic 10. the costs should be balanced against the benefits
and Research Institute, Kopavogur, Iceland
v.gudnason@hjarta.is There are not many diseases or disorders that fulfil
these criteria completely and far from all monogenic
diseases do so. ▶Familial hypercholesterolemia (FH)
Definition is one condition, which does fit exactly. FH is a
monogenic disorder that is due to a defect in the low
“Genetic screening may be defined as any kind of test
density lipoprotein (LDL) receptor gene with a lifelong
performed for the systematic early detection or
elevation of blood cholesterol (2). It has a prevalence of
exclusion of a genetic disease, the genetic predisposi-
1 in 500 in most populations and a high risk of
tion or resistance to a disease or to determine whether a
premature coronary artery disease and can be described
person carries a gene variant which may produce
as an important health problem. The natural history of
disease in offspring. Screening may be concerned with
FH is well understood and the disorder can be detected
the general population or with specific sub-populations
early in life. Treatment is available and there is strong
defined on some basis other than their health” (from the
evidence that early treatment is beneficial. The diag-
European Society of Human Genetics, ▶http://www.
nosis of FH, whether performed by clinical or genetic
eshg.org/).
testing does not require more complex intervention than
venipuncture. Once the diagnosis is made, there is no
Characteristics need for further diagnosis. Considerable experience in
Population screening has been applied in the newly screening for FH has accumulated in many populations
born for a long time for medical conditions known to be and FH can be looked at as a paradigm for screening for
preventable by medical intervention, such as phenyl- other monogenic disorders or diseases.
ketonuria. In more recent times considerably more
knowledge on underlying genetic defects for various
Cellular and Molecular Regulation
diseases has become available. What conditions to
Familial hypercholesterolemia is caused by a mutation
screen for, medically, biochemically or genetically has
in the ▶LDL receptor gene. The ▶LDL receptor takes
been a matter of discussion for decades. Most common
up cholesterol from the bloodstream by binding LDL
diseases have complex multifactorial and probably
cholesterol particles, mainly in the liver and thus reduces
▶polygenic underlying causes. For that reason, genetic
blood levels of cholesterol. Individuals who lack LDL
screening for complex diseases in populations is not
receptor or have a reduced number of functional
really feasible for the time being. However, a number
receptors demonstrate considerably increased levels of
of genetic variations have been associated with the risk
blood cholesterol. FH is an autosomal dominant
of developing disease and may be applied as specific
disorder with a gene dosage effect. This means that
tests in the evaluation of risk in groups of patients
heterozygotes for a mutation in the LDL receptor gene
identified by a given disease. Despite that, only
frequently have double the amount of blood cholesterol
▶monogenic diseases or disorders fit the criteria set
compared to individuals from the general population
for appropriate population screening including genetic
and homozygotes or ▶compound heterozygotes have
screening.
several fold increase in blood cholesterol.
Wilson and Jungner (1) put forth ten criteria, which
A variety of different mutations in the LDL receptor
should be considered before population wide screening
gene have been found to cause FH. These can be
should be set up. These criteria are that:
accessed on the FH web site www.ucl.ac.uk/fh/. The
1. the condition being screened for should be an sheer number of mutations makes the identification of
important health problem. FH complicated. Most mutations are confined to a
2. the natural history should be well understood single or a few families, but others are more wide-
3. there should be a detectable early stage spread. However, there are a number of mutations that
4. treatment at an early stage should be of more have been found to be population specific allowing for
benefit than at a later stage population based screening for a given set of mutations.
5. there should be a suitable test for identifying There are two main approaches for identifying
people at the early stage previously unidentified individuals with FH; first,
6. the test should be acceptable examination of first-degree relatives for a given index
7. intervals for repeating the test should be determined case and secondly, genealogical tracing in defined
Genetic Screening in Populations 685

populations to a common ancestor for a known records or genealogical information is electronically


mutation in the LDL receptor gene. available such as in the example from Iceland described
below (4).
First Degree Relative Screening A common mutation in the LDL receptor gene (I4T
The conventional approach to identify new FH patients +2C) has been identified and found to be responsible for
is to ask FH index cases for informed consent to contact up to 60% of FH in Iceland. For the genetic screening
their first-degree relatives. This is the method recom- only probands with this mutation were included. These
mended by the Med Ped project (Make Early Diagnosis probands were genealogically traced to common
– Prevent Early Deaths in Medical Pedigrees) (▶http:// ancestors by The Icelandic Genetic Council’s family
www.medped.org/). The main advantage of such an tracing office. The family tracing was performed
approach is the high probability of diagnosing family through a partly computerized database derived from
members carrying the LDL-receptor mutation when censuses (first carried out in Iceland 1703), church
contacting close relatives. The probability is 50% for records and birth and marriage certificates. Once a
first-degree relatives of the index case (parent, sibling and common ancestor had been identified, a list of all
child) and then declines by about half for each generation. descendants was produced. The oldest individual alive
In the Netherlands, experience from FH screening using in each family lineage was identified as key individual G
similar first degree relative approaches, revealed 2039 and contacted for cholesterol measurements and for
individuals identified as ▶heterozygous FH of 5442 genetic testing (Fig. 1) after obtaining informed consent.
relatives tested (3) The main disadvantages are that the If positive for the common mutation, his or her offspring
doctor needs to rely on consent from the patient to contact were recruited for testing. Relatives of key individuals
his/her relatives and that pedigrees based on information negative for the common mutation were not recruited.
from the patients are seldom complete for more than one Fourteen probands positive for the common mutation
or two generations back. Therefore the search for new were genealogically traced to four family clusters, one
patients can reach a blind end. cluster with four probands, one with three probands and
two with two probands each. The ancestors for the
Genealogy Tracing for Identification of clusters were born in the late 18th century and early
Affected Individuals 19th century and were traced back for 3 and 4
The genealogy tracing approach is applicable where generations. Three of the probands could not be linked
there is a common mutation in a specific population in to any other proband. The tracing revealed 2201 live
an isolated area where genealogical information is individuals in the four family clusters and of these, 364
available either from church records or other similar (17%) key individuals were identified (Fig. 1). Three

Genetic Screening in Populations. Figure 1 The pedigree shows how two individuals with FH (probands) are
traced to a common ancestor (upwards tracing arrows). The oldest individual alive in each family lineage was
identified as key individual and contacted for genetic testing (downward tracing arrows). Offspring of the key
individuals positive for the mutation were recruited for testing. If positive, their offspring were also called in and so on.
Relatives of key individuals negative for the mutation were not recruited. + means a deceased individual and black
filling an affected individual.
686 Genetic X-Linked Disease

hundred and six key individuals (84%) responded. References


Thirty five (11%) of the 306 key individuals who 1. Wilson J, Jungner YG (1968) Principles and practice of
responded were positive for the common mutation or mass screening for disease. Public Health Papers 34,
nearly one in every 9 key individuals tested. This yield Report 65 (iv) WHO, Geneva
is a fifty-six-fold enrichment from the 1 in 500 yield of 2. Goldstein JL, Hobbs HH, Brown MS (1995) Familial
Hypercholesterolemia. In: Scriver CT, Beaudet AL, Sly
screening the general population. No homozygotes WS (eds) et al The Metabolic and Molecular Bases of
were detected. Of the 35 positive key individuals, seven Inherited Diseases, McGraw-Hill, New York, pp 1981–2030
had not been diagnosed before. 3. Eckenhausen MA, Defesche JC, Sijbrands EJ et al (2001)
This demonstrates that screening extended families is a Review of first 5 years of screening for familial
feasible approach for achieving the goal of finding hypercholesterolaemia in the Netherlands. Lancet
individuals with FH previously unknown to have this 357:165–168
treatable condition. This approach may well be 4. Thorsson B, Sigurdsson G, Gudnason V (2003) Systema-
tic Family Screening for Familial Hypercholesterolemia in
practical in other populations where genealogical Iceland. Arterioscler Thromb Vasc Biol 23:335–333
information is available. 5. Report of a WHO Consultation (1997) Familial Hyperch-
olesterolaemia (FH). WHO Human Genetics Programme,
Clinical Relevance Geneva
Familial hypercholesterolemia is a condition with 6. Kane JP, Malloy MJ, Ports TA et al (1990) Regression of
considerably elevated risk of developing coronary coronary atherosclerosis during treatment of familial
hypercholesterolemia with combined drug regimens.
artery disease, which may eventually lead to a heart
JAMA 264:3007–3012
attack. A report from The World Health Organization 7. Scientific Steering Committee on behalf of the Simon
(WHO) shows the mean age of onset for coronary heart Broome Register Group (1999) Mortality in treated
disease in untreated individuals to be 45–48 years in heterozygous familial hypercholesterolaemia: implications
males and 55–58 in females (5) This enormously raised for clinical management. Atherosclerosis 142:105–112
risk of a heart attack that is potentially preventable calls
for an active search for and identification of affected
individuals at an early age and an aggressive treatment
of all known risk factors for coronary heart disease. The
WHO conference in Paris 1997 (5) urged an early
diagnosis and treatment of individuals with FH. The Genetic X-Linked Disease
main challenge is to prevent premature atherosclerosis
in individuals with FH.
In recent years a new class of drugs for the treatment of Definition
hypercholesterolemia has become available. These are A genetic x-linked disease is determined by mutation of
HMG CoA reductase inhibitors called statins, which a gene located on the X chromosome.
directly affect intracellular production of cholesterol ▶Microvilli
and hence lead to an increase in the number of LDL
receptor molecules on the cell surface of the liver. This
in turn leads to enhanced uptake of cholesterol rich
particles from the circulation with a corresponding
reduction in the level of blood cholesterol. It has been Genetically Engineered Animals
demonstrated that cholesterol-lowering drugs are
effective in reducing coronary stenosis assessed by
coronary angiography in patients with FH (6) and there ▶Transgenic and Knock-out Animals
is evidence for improved survival of patients with FH
in recent years especially after the introduction of statin
therapy (7).
The prognosis for primary prevention of coronary artery
disease in heterozygous FH patients is excellent and for
that reason it is a major important challenge to identify Genome
undiagnosed or inadequately treated individuals.
The above example of genetic screening fulfils all the
criteria for screening in populations set by Wilson and Definition
Jungner in 1968 (1). There are a number of monogenic The term genome refers to the complete set of genetic
diseases that may benefit from the experience obtained information contained in an organism or a cell, which
from the effort of identifying new FH patients by includes both the chromosomes within the nucleus and
systematic search. in mitochondria.
Genome-Wide Analysis 687

▶Biochemical Engineering of Glycoproteins


▶Chromosome 21 Disorders Genome Scan
▶COPD and Asthma Genetics
▶Functional Genomics, the Systematic Analysis of
Gene Function of all Genes and Gene Products in Definition
Parallel Genome scan refers to a genetic research method in
▶Protein Microarrays as Tool for Protein Profiling and which the entire DNA of an organism is searched
Identification of Novel Protein-Protein Interactions systematically for locations on the chromosomes that
are inherited in the same pattern as a specific trait. This
method is usually applied to collections of families that
show multifactored inheritance of specific traits, such as
type 1 diabetes.
▶Diabetes Mellitus, Genetics
▶Manic Depression
Genome Analysis in Plants
G
▶Plant Genomics
Genome Screen

Definition
Genome screen describes the testing of a population
Genome Engineering group to identify a subset of individuals at high risk for
having or transmitting a specific genetic disorder, by
using several hundred markers selected from the whole
genome to identify chromosomal regions that are co-
▶Cre/loxP Strategies
inherited (linked) with a specific disease.
▶Atopy Genetics
▶COPD and Asthma Genetics
▶Genetic Screening in Populations

Genome Functionalization by Arrayed


cDNA Transduction Genome Walking

▶Automated High Throughput Functional Character-


ization of Human Proteins
Definition
Genome walking is a local physical mapping technique
for obtaining unknown DNA regions on either side of
chromosomal regions of known nucleotide sequences.
▶YAC and PAC Maps

Genome Instability
Genome-Wide Analysis
Definition
Genome instability describes processes in cells that
accumulate mutations with high frequency. These Definition
mutations include point mutations, insertions, deletions A genome-wide analysis is the systematic investigation
and translocations. of all regions of the genome to determine those
▶Chromosomal Instability Syndromes polymorphisms more often associated with a disease.
▶DNA-Repair Mechanisms ▶Common Diseases, Genetics
688 Genomic Analysis of Single Disseminated Cancer Cells

frequencies are extremely low, being about one or two


Genomic Analysis of Single cancer cells per two million bone marrow cells. Thus,
Disseminated Cancer Cells disseminated cancer cells belong to the rarest cells in
the human body and their molecular-genomic char-
acterization requires special techniques.
C HRISTOPH A. K LEIN
Institute of Immunology, Ludwig Maximilian Characteristics
University, Munich, Germany (Whole) genome analysis of single disseminated
christoph.klein@med.uni-muenchen.de cancer cells
The first studies that characterized cytokeratin-positive
cells intended to confirm their malignant nature. It was
Definition shown by interphase fluorescence in-situ hybridisation
The term “disseminated cancer cells” refers to cells that (▶FISH) that some cytokeratin-positive cells harbour
originate from a malignant primary tumour and are chromosomal abnormalities. Then, protein expression
found at ectopic sites as single cells or small cell of a variety of cancer-associated molecules was tested
clusters. More than 80% of human malignant tumours on cytokeratin-positive cells. Although sporadic
stem from epithelial tissues (such as mammary glands, insights into the biology of disseminated cancer cells
lung or gastro-intestinal organs). The major cause of could be obtained, double staining (or labelling in the
death for patients that suffer from these types of cancer case of FISH) was very cumbersome because of the
(carcinomas) is metastasis, i.e. emergence of tumour extreme rarity of the investigated cells.
colonies that are found in organs distant from the A single diploid cell contains about 6 pg genomic
primary site of tumour growth. Today many cancer DNA. Multiple molecular-genetic analyses of such
patients are diagnosed before metastatic disease is minute amounts therefore require amplification. To this
detected by clinical imaging techniques (such as X-ray end several methods have been developed with the
or CT scans) and will be submitted to surgery of their most frequently used being the ▶DOP-PCR (degen-
primary tumour. A substantial percentage of these erate oligonucleotide-primed ▶PCR) and the PEP
patients, however, will eventually succumb to metas- (primer extension, preamplification) method or deriva-
tasis months, years of even decades after initial tives thereof. These methods use mixtures of degen-
diagnosis of their primary cancer, indicating that erate or random primers to amplify the whole genome,
tumour cells left the primary tumour before the surgeon which leads to the problem that it is impossible to
removed it, and that some of these cells have the control for equal binding of the primers to complex
potential to found a metastasis. Therefore, great effort sequences for unbiased amplification. However un-
was undertaken to detect the precursor cells of later biased amplification is mandatory for the application of
arising metastasis at the time of surgery in order to whole genome screening techniques such as compara-
develop means for prevention of later arising metas- tive genomic hybridization that measures numerical
tasis. For cancers of epithelial origin, detection of aberrations in tumour cell genomes. To circumvent this
disseminated cancer cells is often based on histogenetic problem an adaptor-linker approach was developed.
markers that enable discrimination from the surround- Here, the genome is cut into small fragments (about
ing cells, i.e. identification of epithelial cells in purely 100–2000 bp) by a frequently cutting restriction
mesenchymal organs. Clinically accessible mesenchy- enzyme. Then, adaptors are ligated to both ends of
mal organs are blood, bone marrow or lymph nodes. the fragments and the fragmented genome is
Thus the applied markers (e.g. epithelial cytokeratins) subsequently amplified using a single primer that
do not detect tumour cells directly but epithelial cells binds to the adaptor and thereby amplifies all fragments
that are usually not found in mesenchymal organs of alike.
donors without epithelial malignancy. Clinical studies The approach is increasingly applied to disseminated
revealed that the finding of cytokeratin positive cells cancer cells of various types of tumours and the results
(with most studies performed on bone marrow or have changed the prevailing view on metastatic
lymph node) at the time of surgery puts carcinoma progression. Of the many interesting findings perhaps
patients at high risk for the development of metastasis the most important are that dissemination occurs often
later on. Depending on tumour type and clinical stage, earlier than previously thought. The cancer cells often
approximately every third carcinoma patient without display less or different genomic aberration than their
manifest metastases harbours disseminated cancer cells matched primary tumours. Thus metastases and
in bone marrow or macroscopically tumour-free lymph primary tumours seem to develop independently to a
nodes. However, absolute and relative tumour cell large degree.
Genomic Analysis of Single Disseminated Cancer Cells 689

Gene expression analysis of single disseminated antigen HA-1 is aberrantly expressed on single
cancer cells disseminated cancer cells. This finding makes it
With the completion of the human genome project and reasonable to apply allogeneic bone marrow transplan-
the introduction of technologies such as DNA micro- tation as immunotherapy in a HA-1 mismatch situation.
arrays and laser microdissection, many fields in We recently adopted the protocol for high-density
biology and medicine await the application of compre- oligonucleotide microarrays. Thus, screening of all
hensive gene expression analyses of specific cell types expressed human genes may reveal new target
isolated from defined tissues. For the amplification of structures on single disseminated cancer cells, the
single cell mRNA the first protocols were introduced in precursor cells of lethal metastasis.
the late eighties and early nineties of the last century. So
far the protocols are based on either of two principal Clinical Relevance
approaches, linear amplification by T7 RNA polymer- Clinically manifest metastatic disease can rarely be cured.
ase or PCR-based amplification. One likely reason is that the cancer cells have
As a general rule, PCR-based methods are easier to genomically and phenotypically progressed so far that
handle and less time consuming, while there are they are highly resistant against current ways to induce
concerns about the quantitative reliability of measure- apoptosis by any type of treatment, another that the G
ments obtained after exponential amplification. The tumour burden is just too large for complete tumour cell
linear amplification achieved by T7 RNA polymerase, eradication at tolerable drug doses. Therefore, systemic
also referred to as the Eberwine protocol, has the therapies are added to loco-regional treatment (e.g.
advantage that a potentially occurring failure to amplify surgery or irradiation therapy) before metastasis becomes
a given transcript will not be exponentially transmitted. manifest. Such therapies target the relatively few tumour
Here, mRNA is transcribed by a primer containing the cells that spread throughout the body, have therefore been
promotor of the T7 RNA polymerase. After ▶cDNA called “adjuvant” and are currently in the centre of clinical
synthesis, in-vitro transcription is performed and the efforts. The underlying rational is to destroy the tumour
procedure is repeated once or twice. The effect of the seed timely when the tumour load is low and the cells still
few cycles is thought to change only marginally – if at all vulnerable. However, adjuvant chemotherapies, which
– the original template ratios. On the other hand, several are currently the best characterised and most effective,
groups have observed that the relative abundance of have not fulfilled the hopes so far. Although several
transcripts is also preserved by PCR-based methods – therapy regimens improve significantly the overall- and
provided that the correct conditions are applied. the disease-free survival of the patients, the absolute
Our preferred method belongs to the approaches using benefit is rather low being in the range of few percent and
PCR. The protocol uses a single primer that binds to improving survival time for the individual patient by few
two binding sites artificially introduced to all mRNA months. It is increasingly recognized that one reason for
sequences. First, a poly-C flanking region is incorpo- the failure of adjuvant therapies is the almost complete
rated during cDNA synthesis and after reverse lack of knowledge about the target cells – the
transcription a poly G-tail is added. Four aspects seem disseminated cancer cells. Disseminated cancer cells are
to be particularly important. Firstly, single cell mRNA genomically and phenotypically often very different from
is bound to a solid phase enabling the change of buffers their matched primary tumours. Therefore, therapies that
and thereby always optimal conditions for each are based on mechanisms active in primary tumours do
enzymatic reaction. Secondly, random primers for not necessarily exert an effect on disseminated cancer
cDNA synthesis reduce the length of primary transcript cells. Rather, direct analysis of single disseminated cancer
and allow for subsequent amplification within the cells promises to uncover novel molecular targets for
optimal range for PCR. Thirdly, a poly-G tail provides effective adjuvant therapies.
a much better primer binding site than a poly A or poly
T tail. Fourthly, introducing a poly-C flank on one side References
of the template and a poly-G tail on the other makes all 1. Braun S et al (2000) Cytokeratin-positive cells in the bone
sequences equally G/C-rich at their primer-binding site. marrow and survival of patients with stage I, II, or III
Adequate conditions for a single poly-C PCR primer, breast cancer. N Engl J Med 342:525–533
i.e. high annealing temperature and the addition of 2. Cole BF, Gelber RD, Gelber S et al (2001) Polyche-
denaturing agents such as formamide enable highly motherapy for early breast cancer: an overview of the
specific and unbiased amplification of such sequences. randomised clinical trials with quality-adjusted survival
analysis. Lancet 358:277–286
With this amplification method in hand, gene expres- 3. Iscove NN et al (2002) Representation is faithfully
sion profiling of single cells has become possible and preserved in global cDNA amplified exponentially from
first interesting results have been obtained. For sub-picogram quantities of mRNA. Nat Biotechnol
example, we found that the minor histocompatibility 20:940–943
690 Genomic Clone

4. Klein CA et al (1999) Comparative genomic hybridiza- allows the cells to differentiate the alleles of paternal
tion, loss of heterozygosity, and DNA sequence analysis and maternal origin of a gene without changing their
of single cells. Proc Natl Acad Sci USA 96:4494–4499 DNA sequence. The phenomenon of imprinting was
5. Klein CA et al (2002) Combined transcriptome and
genome analysis of single micrometastatic cells. Nat explicitly recognized at the beginning of the 1980s
Biotechnol 20:387–392 on the basis of two types of observations. First,
6. Klein CA (2003) The systemic progression of human pronuclear transplantation studies on mouse zygotes
cancer: A focus on the individual disseminated cancer cell demonstrated that monoparental conceptuses were
– the unit of selection. Adv Cancer Res 89:35–67 not viable, suggesting that biparental contribution is
7. Telenius H et al (1992) Degenerate oligonucleotide- necessary for mammalian development (1, 2). Second,
primed PCR: general amplification of target DNA by a
systematic genetic studies of mice with chromo-
single degenerate primer. Genomics 13:718–725
8. Zhang L et al (1992) Whole genome amplification from a somal translocation showed that some chromosomal
single cell: implications for genetic analysis. Proc Natl regions must be inherited from both parents for
Acad Sci USA 89:5847–5851 normal development (3). These pioneer studies led to
the construction of a low-resolution chromosomal
imprinting map of the mouse genome. It was postulated
that the requirement for both parental genomes to be
present in the same zygote was a consequence of
Genomic Clone differential epigenetic marks on a fraction of the
paternal and maternal alleles. The parental origin-
specific imprints on the two alleles of the same gene
Definition lead to their differential expression. Typically, one
Genomic clone denotes a fragment of cloned DNA parental allele is silenced and only the other remains
originating from the genome of the organism of interest functional. It is precisely this property that has been
rather than from a reverse-transcript of an RNA. used to identify imprinted genes. A gene is considered
▶YAC and PAC Maps imprinted if it is expressed monoallelically and its
allelic expression depends on its parental origin.
However, the real situation is more complicated. The
analyses of the known imprinted genes have demon-
strated that most of them are expressed biallelically in
Genomic Control some tissues, at least at some developmental stages. As
a consequence, we face a paradoxical situation. It is
quite difficult to prove that a gene is not imprinted,
unless its allelic expression is examined in all tissues at
Definition
all stages of the life. The ambiguity of the definition
Genomic control describes a method to control for
might be the reason for the difficulty estimating the
population stratification in association studies. The
number of imprinted genes in the genome. On the basis
degree of genotype-phenotype association for a large
of the relatively low frequency of mutations with
number of neutral polymorphisms is measured and
parental origin-dependent phenotype, the number of
used to correct associations with candidate causal
imprinted genes was initially estimated as not more
polymorphisms.
than 300. However, a systematic transcriptome analysis
▶COPD and Asthma Genetics
suggested that there might be more than 2,000 genes
with differential parental origin dependent expression
in the mouse.

Genomic Imprinting Characteristics


Characteristics of Imprinted Genes
In order to understand the phenomenon of imprinting,
TAKUYA I MAMURA , A NDRAS PALDI current research is focused on the following major
Ecole Pratique des Hautes Etudes, Evry, France questions: 1) How are the parental alleles of a gene
paldi@genethon.fr marked without changing the DNA sequence? The
imprint has to be sufficiently stable to be inherited
through mitosis, but reversible during the meiosis. 2)
Definition When do the genes acquire their parental-specific
Genomic imprinting, also called parental or gametic imprints? 3) What is the biological significance of this
imprinting, is a process of epigenetic marking that phenomenon?
Genomic Imprinting 691

The first two imprinted genes, Igf2r and Igf2, were regions also show recombination with different
identified in 1991 on the basis of the parental origin- frequencies during male and female meiosis.
dependent phenotypes of heterozygous mutants. More
Molecular Mechanisms
than 70 genes have been identified in the mouse and
As indicated above, DNA methylation is a part of the
human genome so far. (We quote here two compre-
mechanisms that differentiate the parental alleles of
hensive databases of imprinted genes that can be found
imprinted genes. Methylation of cytosines in CG
on the web: ▶http://www.mgu.har.mrc.ac.uk/research/
dinucleotides (CpG methylation) is a well-known
imprinting/imprinting2.html and ▶http://cancer.otago.
▶epigenetic modification of the DNA that regulates
ac.nz/IGC/Web/home.html). Detailed analyses of sev-
chromatin structures such as ▶heterochromatin in
eral of these genes made it possible to determine
concert with covalent ▶histone modifications. The
general characteristics of imprinted genes in addition to
DMRs are usually relatively short, CG-rich DNA
their monoallelic expression:
segments located at a distance from genes, but
1. Imprinted genes are frequently associated with sometimes located in the promoter region or in the
regulatory regions that carry differential ▶DNA coding or intronic sequences. The best characterized
methylation on the two parental alleles. Several DMRs in the human and mouse genome include those
imprinted genes were identified on the basis of located in the regions of the Igf2r, H19, Igf2, Snrpn,
G
systematic search for ▶differentially methylated U2af1-rs1, Gnas and Gtl2 genes. The functional
regions (DMRs) in the genome. In general, DNA importance of these elements for the establishment
methylation upstream of genes, especially in the and maintenance of the methylation imprint has been
promoter region, is associated with attenuation of demonstrated by extensive targeted mutagenesis stu-
the expression. dies. Their deletions frequently perturb the function of
2. It is impossible to classify the imprinted genes on the whole imprinted domain. These elements are
the basis of the products they encode. Peptide usually called imprinting centers for their central role
hormones, growth factors, transcription factors, in the imprinting of a whole region. They most
metabolic enzymes, cell surface receptors and many probably act as a structural organizer affecting gene
other proteins, but also several non-coding RNAs expression over the whole imprinting cluster. This
can be found among the products of imprinted action is presumably mediated by recruiting various
genes. proteins, for instance the CTCF protein, that play a role
3. Another characteristic feature of imprinted genes in making up highly-ordered chromatin structure.
discovered so far is their non-random distribution in Studies of chromatin structure around imprinted genes
the genome. They are frequently clustered in well- revealed differences in nucleosome positioning, his-
defined genomic regions of up to several hundred tone acetylation or nuclease sensitivity between the
kilobases. These clusters usually contain imprinted parental alleles of imprinted genes. In general,
genes with either maternal or paternal monoallelic methylated sequences are associated with hypoacety-
expression, but also genes that have been found to be lated histones, whereas the unmethylated sequences are
biallelically expressed in all tissues analyzed so far. associated with hyperacetylated histones. The differ-
4. Interestingly, many antisense transcripts are also ences observed between the two alleles of an imprinted
detected in the imprinted genomic regions. gene in the same tissue are similar to those typically
5. An important feature of the clusters is their observed between active and inactive copies of the
▶asynchronous replication, which occurs in com- same gene in different tissues. Naturally occurring, or
mon as far as we know, suggesting that this experimentally induced mutations that alter the enzy-
phenomenon is one of the useful criteria that help matic mechanisms responsible for epigenetic modifica-
determine imprinted genes (4). The paternal and tions such as DNA methylation and histone
maternal copies of the whole clusters replicate modifications frequently disturb the normal imprinting
differentially during the mitotic cell cycle, including process and modify the allelic expression of imprinted
imprinted genes, intergenic sequences and the genes genes. In addition to the epigenetic modifications, the
that are expressed biallelically. This characteristic is observation of several ▶antisense RNA transcriptions
independent of the expression state of the genes in in imprinted regions suggests that non-coding RNAs
the cluster and is detected in all cell types. They all might be also involved in the maintenance of the
suggest that cluster level of imprinting regulation characteristic chromatin structure and monoallelic
might be assigned in the context of chromatin expression of imprinted genes. A role of non-coding
structure as described below. RNAs in this process has been suggested by analogy
6. Imprinted chromosomal regions display strikingly with the function of the Xist RNA in the inactivation of
different recombination frequencies during male one of the X-chromosomes in females, although the
and female meiosis. However, many non-imprinted role remains unknown.
692 Genomic Imprinting

Genomic Imprinting. Figure 1 The schematic representation of the cycle of acquisition/erasure of genomic
imprinting in the germ line. Note that, in the germ cell lineage, the epigenetic marks should be erased and then
re-established according to the sex of the individual.

Establishment of the Imprint cell division, suggesting that ▶CpG methylation is not
Each individual inherits a paternal and a maternal copy the only molecular mechanism that plays a role in
of every gene in the genome. However, both alleles are marking the two parental alleles.
transmitted to the offspring either as paternal or as Other characteristics of the imprinted genomic regions
maternal copies depending on the individual’s sex. follow different kinetics during development. For
Therefore, the parental imprint of a gene or a gene example, asynchronous replication of the two parental
cluster has to be erased in the germ line of the copies is maintained during the proliferation of PGC,
individual and re-established according to its sex in when all methylation differences are already erased,
mature gametes (Fig. 1). In order to follow this process, suggesting that differences between the parental copies
changes in CpG methylation pattern of DMRs were are still there even in the absence of methylation.
extensively studied at various imprinted loci. In
general, the differences in CpG methylation pattern Clinical Relevance
are erased from the parental alleles in early ▶primor- The biological significance of the functional none-
dial germ cell (PGC) differentiation. The methylation quivalence of the parental genomes is not yet known.
pattern typical for the paternal or maternal alleles is Many hypotheses were proposed to explain why
established in meiotic cells. At the moment of imprinting has evolved in mammals. The most popular
fertilization, many DMRs are already differentially hypothesis is the so-called parental conflict model.
methylated and conserve their allele-specific methyla- According to this model, imprinting has evolved in
tion profiles at all subsequent stages of development mammals because of the conflicting evolutionary
while the bulk of the genome undergoes important interests of the paternal and maternal genomes over
methylation changes. However, some DMRs acquire the allocation of parental resources. This hypothesis is
their methylation profiles gradually during the somatic based on the assumption that fetal imprinted genes
Genomic Information and Cancer 693

regulate resource transfer from the mother to the fetus. 5. Paldi A (2003) Genomic imprinting: Could the chromatin
Therefore, parents are able to modulate the use of structure be the driving force? Curr Top Dev Biol 53:
115–137
resources by transmitting epigenetically modified
6. Lalande M (1997) Parental imprinting and human disease.
versions of imprinted genes to their offspring. Since Annu Rev Genet 30:173–195
the fetuses develop within the maternal uterus, the
paternal investment in the offspring is obviously much
lower than the maternal investment. This asymmetry
leads to an asymmetry of the imprints on the resource
usage-regulating genes. Genomic Information and Cancer
Another possible explanation is that maintaining
the differential chromosomal structure of the im-
printed regions could be important for the coordinated
T RAVIS D UNCKLEY, K EITH D. C OON ,
replication of the genome and the correct segregation of
D IETRICH A. S TEPHAN
the chromosomes during mitosis (5). The monoallelic
The Translational Genomics Research Institute,
expression of imprinted genes might be a byproduct of
Phoenix, AZ, USA
this process. Indeed, some experimental observations
dstephan@tgen.org G
indicate that the parental copies of imprinted regions
interact with each other during the somatic cell cycle in a
way that is reminiscent of some trans-sensing phenom-
ena observed in Drosophila or plants.
Synonyms
Whatever the biological function of parental imprint- Genomics
ing, perturbations of the process lead to severe
hereditary disorders that develop because of mutations Definition
in the active allele of imprinted genes, the normal but Genomics represents the systematic study of the entire
silenced allele being unable to compensate for the genetic complement (DNA and RNA) of an individual
mutated copy (6). Biallelic expression of usually or population of individuals. Since uncontrolled growth
monoallelically expressed imprinted genes has also of cancerous cells results from inherited or somatically
been implicated in various cancers. For example, acquired mutations scattered throughout the comple-
Prader-Willi syndrome patients often display hypoto- ment of our chromosomes and often affects mRNA
nia, hyperphagia, obesity, hypogonadism and develop- expression, the holistic tools of genomics are particu-
mental delay. Angelman syndrome patients frequently larly relevant to understanding the molecular mechan-
show ataxia, tremulousness, sleep disorders, seizures isms underlying this set of diseases. As such, genomics
and hyperactivity. Both syndromes may also show has played a major role in advancing tumor subclassi-
mental retardation and map to the imprinted gene fication, diagnosis and individualized treatment of
cluster in human chromosome 15q11-13. Beckwith- patients.
Wiedemann syndrome maps to 11p15 and is character-
ized by general overgrowth with symptoms such as Characteristics
hemihypertrophy, macroglossia and visceromegaly. Cancer results from the accumulation of genetic
▶G-Proteins and G-Protein Mutations in Human damage superimposed upon inherited predisposition.
Diseases This damage manifests itself in the form of DNA
▶Microdeletion Syndromes mutations, chromosomal aberrations or epigenetic
▶Prader Willi and Angelman Syndromes alterations in the chromatin structure. Elucidating the
specific genetic events involved in the pathogenesis of
different cancer types will be critical for the develop-
References ment of effective diagnostics and treatments. Initially,
genetic studies in cancer focused primarily on heritable
1. McGrath J, Solter D (1984) Completion of mouse
embryogenesis requires both the maternal and paternal rare and highly penetrant alleles of cancer predisposi-
genomes. Cell 37:179–183 tion genes. Examples include the ▶tumor suppressors
2. Surani MA, Barton SC, Norris ML (1984) Development Rb1 and p53. However, heritable predisposition alleles
of reconstituted mouse eggs suggests imprinting of the account for a small percentage of cancer causing
genome during gametogenesis. Nature 308:548–550 events. Multiple combinations of weak genetic variants
3. Cattanach BM, Kirk M (1985) Differential activity of may have a much larger impact on the development of
maternally and paternally derived chromosome regions in
mice. Nature 315:496–498
cancer in the general population, the majority of whom
4. Kitsberg D, Selig S, Brandeis M et al (1993) Allele- do not have inherited alleles of the known highly
specific replication timing of imprinted gene regions. penetrant cancer predisposing genes. Thus, cancer can
Nature 364:459–463 be viewed as a multigenic disease. The challenge of
694 Genomic Information and Cancer

cancer genomics is to identify the multiple genetic applicable to the study of any disease with a genetic
variants that are involved in the development of cancer basis.
and to determine their effects on the molecular
pathways of the premalignant cell type. DNA Sequencing
Identifying the genetic determinants of such a complex, DNA sequencing represents the ultimate in high-
multigenic disease as cancer necessitates the develop- throughput cancer analysis. Once we fulfill the
ment and use of high throughput methods of genomic mandate of the HGP for rapid, whole-genome sequen-
analysis. Toward that end, high throughput genomic cing at minimal cost, we will revolutionize the ability to
DNA scanning technologies (sequencing, LOH, CGH understand and diagnose cancer. Early attempts include
and FISH) as well as microarray-based technologies massive parallel DNA arrays for hybridization-based
have been critical. Microarray technology involves the sequencing. These arrays, designed by Perlegen
fixation of DNA molecules to a slide or wafer. These Sciences, Inc. (CA, USA) consist of overlapping
DNA molecules can be placed on the microarray slides oligonucleotide probes that span the entire genome.
at very high densities, allowing for high throughput This makes possible direct and rapid sequencing of the
genome-wide analyses. Various microarray technolo- entire genome. This has clear advantages, particularly
gies exist, including ▶single nucleotide polymorphism when an unidentified disease gene has been mapped
(SNP) arrays for linkage and LOH studies, ▶compara- previously using standard positional cloning strategies.
tive genomic hybridization (CGH) arrays, DNA
sequencing arrays and perhaps most widely recog- Comparative Genomic Hybridization
nized, gene expression microarrays. Comparative genomic hybridization (CGH) allows one
to visualize gross chromosomal losses or gains by a
Single Nucleotide Polymorphism Genotyping for modification of traditional karyotyping. The entire
Linkage and LOH Studies genomic complement of a normal individual is labeled
Traditionally, linkage analyses and ▶loss of hetero- and compared to a genomic sampling from a tumor that
zygosity (LOH) have been done on a genome-wide has been labeled with a different fluorophore. While
scale using ▶microsatellite markers at a density of revolutionary, the typical resolution of traditional CGH
10 cM (Mb) intervals. This is a tedious methodology. is 10 Mb. Array CGH provides the advantage of
The resolution is appropriate for linkage studies but higher resolution compared to traditional CGH methods
inadequate for LOH in the majority of cases. New (up to 0.5 Mb) and is generally useful for identifying
information from the human genome project (HGP) has large deletions, insertions, amplification events or
resulted in a high resolution SNP map of the human overall changes in ploidy. Importantly, array CGH can
genome, as well as new technologies for rapidly be used as an adjunct to expression microarrays (see
genotyping these SNPs. Single nucleotide polymorph- below) since the molecular events detected by CGH will
isms represent DNA base pair variations within a have effects on gene expression levels.
population of individuals. They can either have no
effect on gene expression or may have subtle effects Expression Microarrays
that, when combined, may lead to disease phenotypes. Gene expression microarrays are used to rapidly assess
An SNP occurs, on average, once in every 1300 base the gene expression profile of different cell types. For
pairs of the human genome and they account for the genetic mutations to affect cell proliferation, and hence
majority of genetic variability between individuals the development of cancer, they must alter the function
within a population. Although SNPs are biallelic and of at least some of the signaling pathways inside the
thus less informative, their density of every 30 kb (on affected cell. These effects can be seen as altered mRNA
the new Affymetrix 100k SNP array) allows larger expression profiles (even members of phosphorylation
haplotype block content to be inferred. Thus SNPs cascades can be dysregulated) in malignant cell types
are likely to be equally informative over multiple and can be identified using expression microarrays.
adjacent SNPs as well as having the ability to identify There are two main variations of expression arrays
smaller hemizygous deletions. SNP array technology currently in use, cDNA and oligonucleotide micro-
has great potential for discovering multigenic contribu- arrays. Oligonucleotide arrays provide the benefits of
tions to cancer development. For example, by compar- greater specificity since the probes used are of shorter
ing the SNP profiles from a group of individuals sequence (25–70 nucleotides) than those used for
that have a certain cancer type to the SNP profiles cDNA arrays (200–2000 nucleotides). cDNA arrays
of unaffected individuals, one can identify a set of have greater sensitivity but cannot, for example,
SNPs that are uniquely associated with that form of discriminate between splice variants. For each type of
cancer (1). Since the sequence information for the SNPs array, sequences of DNA that are homologous to
is known, one can rapidly move towards identifying the different genes of interest are attached to a glass slide
relevant genes. Importantly, this technology is generally at different locations. Each probe (or probe set for
Genomic Information and Cancer 695

oligonucleotide arrays) on a slide corresponds to a single such as surgical resection of at risk tissue, are needed.
gene and each slide can hold thousands of probes. One However, this approach suffers from the limitation that
then hybridizes labeled cRNA from the cell type of not all individuals carrying susceptibility genes will
interest to the microarray, rapidly generating an mRNA develop cancer. Further subclassifying a person’s
expression profile for thousands of genes. cancer risk based on additional genetic determinants
will restrict such surgical prevention strategies to only a
Integration of LCM into ▶Expression Profiling subset of individuals with the highest likelihood of
One important limitation of expression profiling has developing particularly aggressive forms of disease.
been that the tissues used often contain multiple cell The above example with breast cancer illustrates the
types in addition to the diseased or cancerous, cells. expectation that, as more tumors are profiled and
Additionally, cancers are almost always mosaic with subcategorized, expression profiling could become a
respect to acquired somatic changes. Thus, they are general tool for predicting the course of cancer
heterogeneous with respect to the clinical and histo- progression and for guiding prevention strategies in
pathological trait under study. This adds unwanted unaffected individuals. In the future, subcategorizing
expression signatures to the global expression profile tumors based on their expression profiles may aid in
obtained and may generate misleading results. ▶Laser patient-specific therapies that are designed to be most G
capture microdissection (LCM) is increasingly being effective in the clinic on a real-time basis for the
used to overcome this limitation. LCM uses an infrared treatment of particular forms of cancer.
laser to select only cell types of interest from a thin Microarrays are also important for identifying dysre-
section of tissue sample. In the study of cancer, this gulated genes and signaling pathways that are involved
allows the analysis of a nearly homogeneous population in tumor development and progression (4). Identifica-
of malignant cells, generating a cleaner expression profile tion of these genes and pathways will have important
that is more indicative of the cancerous state. Because the implications for the development of novel anticancer
volume of cells harvested using this technique is low, therapies since they provide novel targets for treatment.
RNA amplification techniques must be used to generate As expression profiling in the field of cancer biology
enough RNA for expression profiling. moves forward, a critical goal will be to translate the
vast amounts of biological data into meaningful clinical
Clinical Relevance advances. This will require large collaborative efforts
Clearly, identification of the underlying DNA or RNA pooling the combined knowledge and expertise of
defects leading to cancer development and progression different institutions to accomplish successfully all of
will lead to a greater understanding of tumorigenesis the required goals from tumor sample acquisition, to
and will translate directly into drug design. SNP genomic analysis, to target identification and valida-
analyses provide the potential for early and reasonably tion, to drug design and discovery.
noninvasive cancer diagnosis by detecting heritable
cancer specific mutations in peripheral tissues, such as Summary
blood. As a result, treatments may be commenced Analysis of SNPs and gene expression profiling are
earlier than was previously possible (2). Additionally, valuable methods for identifying the genetic determi-
SNP analysis could become a routine screening nants of cancer. However, to realize the value of these
procedure to identify individuals with SNP haplotypes techniques fully, it will be critical to translate the
that place them at risk for developing specific forms of findings into practical applications that can benefit
cancer. This information could be used to direct at risk individuals who are suffering from cancer and those
individuals to appropriate prevention strategies. As the who are at risk of developing particular forms of
technology matures, diagnosis and screening using this cancer. Knowledge of an individual’s innate suscept-
methodology may become economically feasible for ibility to various cancer types can be used to guide the
general practice. In addition to providing a diagnosis course of prevention strategies that focus, for example,
method, SNP analyses will be useful for identifying on lifestyle changes such as diet and exercise. In
causative mutations and affected molecular pathways addition, SNP and expression profiles can be used both
in various cancers. Following validation of these to diagnose cancers accurately and to subcategorize
pathways, new targets for therapy will emerge. tumor types based on the severity of the malignant
Gene expression profiling has numerous important phenotype. This information could then be used to
clinical applications. First, expression profiling already target the most aggressive therapies to patients with the
has been used to predict the prognosis of disease course most severe or invasive forms of disease. Ultimately,
in ▶breast cancer (3). In breast cancer susceptibility the information gleaned from these powerful genomics
screening, genetic testing to identify individuals techniques will be used to identify novel targets for
carrying known predisposition alleles has been used therapeutic intervention with the eventual endpoint of
to indicate when more extreme prevention strategies, preventing tumor growth and metastasis.
696 Genomic Instability

References
1. Hoque MO, Lee CC, Cairns P et al (2003) Genome-wide Genotoxin
genetic characterization of bladder cancer: a comparison
of high-density single-nucleotide polymorphism arrays
and PCR-based microsatellite analysis. Cancer Res Definition
63:2216–2222 A genotoxin is a chemical, or another agent, which
2. Sidransky D (2002) Emerging molecular markers of
cancer. Nat Rev Cancer 2:210–219 damages cellular DNA resulting in mutations and/or
3. van ’t Veer LJ, Dai H, van de Vijver MJ et al (2002) cancer.
Expression profiling predicts outcome in breast cancer. ▶Chromosomal Instability Syndromes
Breast Cancer Res 5:57–58
4. Mac Donald TJ, Brown KM, La Fleur B et al (2001)
Expression profiling of medulloblastoma: PDGFRA and
the RAS/MAPK pathway as therapeutic targets for
metastatic disease. Nat Genet 29:143–152
Genotype

Definition
Genotype refers to the genetic constitution of an
Genomic Instability organism or cell; be it the alleles at a given locus, or
those of several loci. For an autosome, the genotype for
a specific chromosomal location would be 2 alleles.
Definition ▶COPD and Asthma Genetics
Genomic instability describes a phenotypic feature of ▶Diabetes Mellitus, Genetics
the cell, in which the genetic material mutates at a faster ▶Familial Dilated Cardiomyopathy
rate than normal as a consequence of a deficiency in ▶Large-Scale ENU Mutagenesis in Mice
proteins that function in ▶DNA repair, cell cycle ▶Schizophrenia Genetics
checkpoints, chromosome structure maintenance, and
chromosome segregation, etc.
▶Bloom Syndrome
▶DNA Helicases
▶DNA Repair Mechanisms Genotype-Driven Approach

Definition
Genotype-driven approach describes a plan of action
based on the hypothesis that a specific gene is
Genomics responsible for a specific function. To this end, the
specific gene is mutated and the resulting phenotype
(appearance) of the organism provides information
Definition about the function of the gene.
The mapping, sequencing, and analysis of an organ- ▶Mouse Genomics
ism’s genome.
▶Protein Databases

Genotype-Phenotype Correlations

Genomics Definition
Genotype-phenotype correlations describe the relation-
ship between genotype (polymorphisms, sequence,
▶Genomic Information and Cancer variants, and mutations) and phenotype (their clinical
▶Functional Genomics, the Systematic Analysis of the expression).
Function of All Genes and Gene Products in Parallel ▶Heritable Skin Disorders
GFAP 697

cell lines have been formed by mutations. The period of


Genotyping development at which this mutation is formed
determines the cell type of mosaicism: somatic or
germline cells. The germline of one individual consists
Definition of two or more populations of cells, due to mutation(s)
Genotyping is the determination of the specific allelic in one or more clonally expanded cell(s) in the
composition of a genome, a gene or a set of genes. population of germline cells. Germline mosaicism
▶Cell Polarity may vary between only a few cells or about 50% cells,
▶SNP Detection and Mass Spectrometry and is the reason why genetically unaffected parents
may have children with more than one X-linked or
dominant genetic disorder.
▶Heritable Skin Disorders
Geranylgeranyl Pyrophosphate ▶Neurofibromatosis Type 1 (NF1), Genetics

Definition G
Geranylgeranyl is a 20 carbon unit made up of four
Germline Mutation
isoprene (dimethyl allyl) units, and in the form of the
pyrophosphate, is a precursor molecule in cholesterol
biosynthesis. Definition
▶Protein Prenylation Germline mutation denotes a mutation that affects the
▶Tangier Disease complete organism including the germ cells, and thus is
passed on to the progeny of the affected individual.
▶Microarrays in Pancreatic Cancer

Germ Cells
Germline Transmission
Definition
Germ cells are pre-meiotic or post-meiotic sperm cells Definition
and egg cells. Germline transmission refers to a process where the ES
▶Mutagenesis Approaches in the Zebrafish derived cells of a chimera contribute to the reproductive
cells of a mammal (germ cells) and are genetically
passed to its offspring.
▶Large-Scale Homologous Recombination Ap-
Germinal Vesicle proaches in Mice

Definition
Germinal vesicle is the meiotic prophase nucleus of an GFACT Expression screening
amphibian oocyte.
▶Xenopus as a Model Organism for Functional
Genomics ▶Genome Functionalization by Arrayed cDNA Trans-
duction (GFACT) Expression screening

Germline (Gonadal) Mosaicism


GFAP
Definition
The term mosaicism in general refers to an organism Definition
that is composed of two or more cell lines, which The glial fibrillary acidic protein (GFAP) is an
originate from only one zygote and differ in their intermediate filament protein that is characteristic
genotype or chromosomal constitution. The different for astrocytes, but it is also expressed in certain
698 GFFKR Domain

populations of ▶nestin-expressing neural stem cells. It the patch clamp analysis. The giga-seal is characterized
has thus become an ambiguous marker. by a large electrical resistance that reaches values in the
▶Neural Stem Cells giga-ohm range.
▶Patch Clamping

GFFKR Domain

Definition Glanzmann’s Thrombasthenia


GFFKR domain refers to a conserved amino acid motif
of the cytoplasmic tail of α integrin chains. It serves to
stabilize the α and β integrin subunits in close spatial Definition
contact. This keeps the extracellular, ligand binding Glanzmann’s Thrombasthenia is an autosomal reces-
part of the integrin in a folded inactive form. Lysins, sive disorder, characterized by the absence of dysfunc-
amino acids proximal to the GFFKR motif, are tion of the ▶GPIIb/IIIa complex, resulting in defective
essential for interaction of the integrin α chain with platelet aggregation.
RAPL. Binding of RAPL contributes to spatial ▶Hereditary Hemostatic Defects and Recombinant
separation of the cytoplasmic integrin chains, allowing Proteins for Treatment
unfolding of the extracellular part into a ligand binding
integrin.
▶Focal Complexes/Focal Contacts

GLI
GFP

Definition
▶Green Fluorescent Protein GLI comprise a family of zinc finger transcription
factors involved in both developmental regulation and
human diseases. Zinc-finger transcription factors of the
GLI family play critical roles in the mediation and
GGA Proteins interpretation of Hedgehog signals. The Drosophila
homologue is Cubitis interruptis (Ci).
▶Hedgehog Signalling
Definition ▶Wnt/Beta-Catenin Signaling
GGA proteins (Golgi-associated, γ-adaptin homolo-
gous, ARF-interacting proteins) constitute a conserved
multidomain protein family involved in traffic between
the Golgi complex and endosomes. They are recruited
to membranes by GTP-bound ▶ARF. They can interact
with trafficking motifs present on certain cargo, e.g. the Glial Cells
mannose–6–phosphate receptor, and also interact with
▶clathrin, making them functionally analogous to
▶adaptor complexes. Definition
▶Vesicular Traffic Glial Cells are the non-neuronal cells of the nervous
system. Glial cells do not carry nerve impulses (action
potentials) but do have essential supportive functions,
including physical support, provision with nutrients
Giga-Seal and trophic factors (astrocytes), insulation of axons
(oligodendrocytes and Schwann cells), and phagocytic
functions (astrocytes and microglia). During develop-
Definition ment, radial glial cells provide a scaffold for neuronal
Giga-seal denotes the tight connection between the tip migration, and they function as neuronal progenitors.
of a patch clamp pipette and the cell membrane during ▶Glial Cells and Myelination
Glial Cells and Myelination 699

membrane surface form the electron-dense major dense


Glial Cells and Myelination line (MDL), those at the condensed extracellular
surface form the intraperiod line (IPL). The membrane
itself is electron-lucent. The ultrastructure of CNS and
H AUKE W ERNER , K LAUS -A RMIN N AVE PNS myelin is remarkably similar. Once myelinated,
Department of Neurogenetics, Max Planck Institute of many axons become dependent on oligodendrocyte
Experimental Medicine, Goettingen, Germany support. Thus, when oligodendrocytes or myelin
hauke@em.mpg.de degenerate in the course of a demyelinating disease,
nave@em.mpg.de some axons will degenerate as well. The mechanism of
this axon-glia interaction is not known.
Myelinated axons of the human PNS exhibit nerve
Definition conduction velocities (NCV) between 5 and 100 m/s,
The large majority of non-neuronal cells in the nervous increasing with the axon diameter. In non-myelinated
system are ▶glial cells. In general, glial cells serve axons, NCV measures 0.5–5 m/s. To exhibit the same
supportive functions for neurons, are not electrically NCV, a non-myelinated axon would have to be of much
excitable but may respond to neurotransmission. This larger diameter. Thus, myelin has evolved in verte- G
overview focuses on two types of glial cells that brates as means to achieve high NCV and reduced
myelinate axonal processes, ▶oligodendrocytes in the space requirements.
central nervous system (CNS), and ▶Schwann cells in
the peripheral nervous system (PNS). ▶Myelin enables Molecular Interactions
▶axons to conduct action potentials much more Only axons with a diameter >1 μm are myelinated,
rapidly, by insulating the axonal membrane, decreasing smaller axons may be engulfed (i.e. in the PNS by
its capacitance and restricting ion fluxes to the ▶nodes non-myelin forming Schwann cells) but are not
of Ranvier. Myelin is largely made during early enwrapped. Axon diameter and myelin sheath thick-
postnatal life. Thus, developmental disorders of ness show a constant relationship (g-ratio). In the PNS,
myelination or loss of myelin are severe neurological the axonal growth-factor neuregulin-I (Nrg1) plays a
diseases, either inherited (▶leukodystrophies, neuro- critical role in signaling size information to receptor
pathies) or acquired. One clinically important myelin tyrosine kinases erbB2/erbB3 on the Schwann cell and
disorder is multiple sclerosis, an inflammatory ▶de- in regulating myelin membrane growth.
myelination of the CNS. Molecularly, compact myelin is comprised of lipids and
myelin proteins that differ in composition between
Characteristics CNS and PNS. Targeted gene inactivation in (‘knock-
The best understood function of oligodendrocytes and out’) mice has been used to explore the function of
Schwann cells is to enwrap axonal processes with an myelin-specific proteins systematically. Many of them
insulating myelin sheath. Myelination occurs largely are abundant integral or membrane-associated proteins
during early postnatal life and can be divided into with functions in membrane adhesion. Their CNS- or
several steps: (i) establishing contact between glial cell PNS-specific expression correlates well with the
and axon (oligodendrocytes engulf multiple axons), (ii) clinical picture of leukodystrophies and neuropathies
spiral enwrapping of axonal segments with up to 50 respectively. Regions of non-compact myelin exist at
layers of membrane, (iii) compaction of myelin by tight the innermost myelin membrane, adjacent to the nodes
association of the intracellular and extracellular mem- of Ranvier and as Schmidt-Lanterman incisures in the
brane surfaces, (iv) formation of functional nodes of internodal region. They provide radial connections
Ranvier and paranodal structures. Although some of the between the periaxonal space and the glial cell soma,
molecules involved in myelination have been identi- form a tight seal at the node of Ranvier (paranodal
fied, the cellular mechanisms are not well understood. loops) and help organize the distribution of ion
In humans, myelination begins around birth, with a channels by interacting with proteins of the axonal
peak in the first five years of life and is mostly membrane. Some structural proteins of non-compact
completed by 11 years of age. Some active myelination myelin are quite abundant. For updated information on
has been observed until the 5th decade, possibly related these genes and diseases refer to the McKusick entries
to neuronal plasticity. During the peak of myelination, of the Online Mendelian Inheritance in Man (▶OMIM)
i.e. within a few days, oligodendrocytes produce a large database (▶http://www3.ncbi.nlm.nih.gov/Omim/).
amount of membrane material that exceeds their own
weight several fold. Myelin Lipids
The highly periodic structure of myelin is best Myelin has a lipid content of 70–80% (compared to less
visualized in cross section under the electron micro- than 50% in other ▶biological membranes) and lipids
scope (Fig. 1). Proteins at the condensed cytoplasmic contribute to its insulating function. How myelin lipids
700 Glial Cells and Myelination

Glial Cells and Myelination. Figure 1 Cross section of a myelinated axon at the electron microscopic level
(upper left), the ultrastructure of compact myelin (lower left), and schematic depictions of structural proteins in
myelin. The condensed cytoplasmic membrane surfaces form the electron-dense major dense line, extracellular
membrane adhesion forms the intraperiod line. The membrane itself is electron-lucent. Membrane proteins
associated with human myelin diseases are depicted in red. PLP/DM20, proteolipid protein; MBP, myelin basic
protein; P0, protein zero; PMP22, peripheral myelin protein of 22kD; Cx32, connexin of 32 kD; MAG,
myelin-associated glycoprotein; CNP, cyclic nucleotide phosphodiesterase.

are enriched in the cell membrane is not understood. to ER retainment of the mutant protein, unfolded
Myelin is particularly rich in cholesterol. Galactosyl- protein response and oligodendroglial apoptosis
cerebroside (GalC) and its sulfated form (sulfatide) are
nearly myelin-specific. Absence of GalC and sulfatide Myelin Protein Zero (MPZ, P0, McKusick *159440)
in mutant mice lacking UDP-galactose:ceramide ga- Myelin protein zero is the most abundant protein of
lactosyl-transferase leads to progressive demyelination compact PNS myelin, expressed exclusively by
and early death. Thus, both lipids are essential for Schwann cells. With a single transmembrane domain
normal myelination, although glucosylcerebroside (an and an extracellular Ig-like domain, P0 is a member of
alternative product) may compensate for some func- the Ig-superfamily of cell adhesion proteins. The
tions of GalC. Patients with the Smith-Lemli-Opitz crystal structure of the Ig-like domain, when combined
syndrome (McKusick #270400), a genetic disorder of with the analysis of P0-deficient mice, indicates that
cholesterol biogenesis, also have myelin abnormalities. homo-tetrameric P0 engages in homophilic interactions
The critical requirement for cholesterol in myelin with the opposing membrane layer, mediating mem-
assembly has been shown with conditional mouse brane adhesion and formation of the IPL. Additionally,
mutants deficient in squalene synthase, a critical and positive charges in the cytoplasmic domain contribute
specific enzyme of cholesterol synthesis. to the establishment of the MDL by direct interaction
with negatively charged head groups of membrane
Proteolipid Protein (PLP, McKusick *300401) phospholipids. Mutations of the human P0 gene cause a
In the CNS, the most abundant protein of compact peripheral neuropathy (CMT1B).
myelin is a hydrophobic integral membrane protein
(proteolipid) with four transmembrane domains and its Myelin Basic Protein (MBP, McKusick *159430)
smaller splice isoform (DM20). PLP/DM20 may form Myelin basic protein refers to a group of related cellular
homo-oligomers and associate with alpha(v)-integrin, proteins associated with both CNS and PNS myelin.
but the function of these interactions is speculative. The The MBP gene encodes at least 5 splice isoforms,
tight association of PLP/DM20 with cholesterol may be ranging from 14 to 21 kD in size. Positively charged
required for membrane ▶raft formation and normal amino acids interact with negatively charged head
membrane trafficking in oligodendrocytes. The ultra- groups of membrane phospholipids causing MBP to
structure of CNS myelin lacking PLP/DM20 in mediate and stabilize myelin compaction at the MDL.
▶knockout mice suggests that the extracellular portion The MBP gene is partially deleted in the natural mouse
of PLP acts as a strut, organizing the extracellular mutant shiverer, which presents with a severe demye-
apposition of myelin layers at the IPL, but myelination linated phenotype. The overall lack of myelin assembly
is possible in the absence of PLP. For comparison, point is not yet fully explained. Shiverer mice provided the
mutations in this gene (or PLP gene duplications) first opportunity to analyze the consequences of a
cause severe ▶dysmyelination in Pelizaeus-Merzbacher missing myelin protein before transgenic knockout
disease (PMD) and in rodent PMD models. This is due techniques became available. An involvement of the
Global Genome Repair 701

MBP gene in a human leukodystrophy has not yet myelin that stabilizes myelin through intramembranous
been demonstrated. tight junctions. The finding that OSP interacts with
tetraspanin-3/OAP-1 suggests that other tight junction
Peripheral Myelin Protein 22K (PMP22, proteins have yet to be identified in myelin.
McKusick *601097)
Peripheral myelin protein 22K is a glycosylated Nodal and Paranodal Specializations
integral membrane protein of PNS myelin. By topology For ▶saltatory nerve conduction, axonal voltage gated
and hydrophobicity PMP22 is related to the proteoli- sodium (Na+) channels must be clustered at the node of
pids of CNS myelin. PMP22 interacts with myelin Ranvier, separated from fast potassium (K+) channels that
protein P0 in the myelin membrane and may stabilize assemble beneath the myelin sheath at the juxtaparanode.
the myelin architecture. Experiments carried out K+ channels are associated with Caspr2, a member of the
in vitro suggest that PMP22 also regulates Schwann neurexin family of ▶adhesion molecules, probably via a
cell proliferation and apoptosis, but its in vivo function PDZ domain protein adapter. Caspr2 in turn is associated
is poorly understood. The PMP22 gene has captured with TAG-1, a GPI-anchored cell adhesion molecule of
interest because a gene duplication in humans underlies the Ig-like superfamily on the glial adaxonal membrane.
the most frequent peripheral neuropathy (CMT1A). Knockout experiments in mice demonstrated that axonal G
Caspr2 is required to maintain K+ channel clusters at the
Myelin-associated Glycoprotein (MAG, juxtaparanode. More devastatingly, in the absence of
McKusick *159460) TAG-1, axonal Caspr2 and K+ channels are unclustered at
Myelin-associated glycoprotein is a member of the Ig- the juxtaparanode. Na+ channel distribution is unaffected
superfamily of cell adhesion proteins, with a single by deletion of juxtaparanodal proteins.
trans-membrane domain and 5 extracellular Ig-like The assembly sites of axonal Na+ and K+ channels are
domains. Its localization at the innermost (adaxonal) divided by the paranode, a region devoid of ion channels.
membrane of both CNS and PNS myelin suggested that The paranodal axon is tightly attached to the glial
MAG is engaged in adhesion and signaling events paranodal loops by a septate-like junctional structure that
between glial cell and axon. The diameter of PNS seals against ion flux and separates Na+ from K+
axons is reduced in mice lacking MAG, suggesting that channels. The paranodal axon is molecularly defined by
MAG-mediated myelin-to-axon signaling regulates the a complex of the Ig-like GPI-anchored cell adhesion
phosphorylation status of the axonal cytoskeleton. molecule F3/contactin associated with the neurexin
Axonal binding partners of MAG include the NoGo Caspr1. This complex interacts with neurofascin155 on
receptor (NoGoR / reticulon 4 receptor, McKusick the paranodal loop. Disruption of individual components
*605566), and sialic acid residues of sialo-glycopro- in knockout mice impedes the septate-like junction.
teins or sialo-glycolipids (gangliosides).

Connexin-32 (Cx32, McKusick *304040) References


Connexin-32 is a member of the connexin family of 1. Arroyo EJ, Scherer SS (2000) On the molecular
gap junction proteins, permeable to molecules <1kD. architecture of myelinated fibers. Histochem Cell Biol
The protein is expressed in many cell types, including 113:1–18
oligodendrocytes and Schwann cells, where it is 2. Werner H, Jung M, Klugmann M et al (1998) Mouse
models of myelin diseases. Brain Pathol 8:771–793
localized to Schmidt-Lanterman incisures and para-
3. Salzer JL (2003) Polarized domains of myelinated axons.
nodal loops. The exact function of Cx32 in PNS myelin Neuron 40:297–318
is not known, but its involvement in the non-classical 4. Schwab ME (2004) Nogo and axon regeneration. Curr
gap junctions that form a connection between different Opin Neurobiol 14:118–214
myelin lamellae of the same Schwann cell is most 5. Colognato H, ffrench-Constant C (2004) Mechanisms of
likely. Radial transport may be required for second glial development. Curr Opin Neurobiol 14:37–44
messenger molecules that are generated at the inner-
most myelin membrane. Mutations in the human Cx32
gene cause a peripheral neuropathy (CMT1X).

Claudin11/Oligodendrocyte-specific Protein (OSP,


McKusick *601326) Global Genome Repair
Claudin11/oligodendrocyte-specific protein is a mem-
ber of the claudin family of ▶tight junction-specific
integral membrane proteins with four transmembrane Definition
domains. Knockout mice revealed that OSP is an Global genome repair designates the branch of
essential constituent of the radial component in CNS nucleotide excision repair that occurs in all regions of
702 Glomerular Filtrate

DNA, with the exception of actively transcribed genes, superfamily that mediate an organism’s response to
and removes damage that could block DNA replication. glucocorticoids or mineralocorticoids, respectively, by
▶Nucleotide Excision Repair changing the transcription rates of glucocorticoid- or
mineralocorticoid-responsive genes.
▶Steroid Hormone Receptor Defects, Molecular
Basis
Glomerular Filtrate

Definition
Urine formation in the kidney begins when the fluid
portion of the blood leaves the glomerulus and enters the
Glucocorticoid/Mineralocorticoid
glomerular capsule as glomerular filtrate. Glomerular Resistance
filtrate consists of water and small size components of
blood, separated from blood cells. The glomerular
filtrate flows into the tubules, where further water is Definition
extracted from the filtrate, and minerals and other body Glucocorticoid/mineralocorticoid resistance are patho-
chemicals are absorbed from or secreted into the filtrate. logic conditions that demonstrate several manifesta-
▶Diabetes Insipidus, a Water Homeostasis Disease tions caused by partial insensitivity of tissues to
glucocorticoid or mineralocorticoid hormones. These
are frequently due to inactivating mutations in the
glucocorticoid or mineralocorticoid receptors.
▶Steroid Hormone Receptor Defects, Molecular
Glomerulonephritis Basis

Definition
Glomerulonephritis (GN) refers to inflammation of the
capillary loops of the glomeruli.
▶Morbus Wegener
▶SLE Pathogenesis Genetic Dissection
Glucocorticoids

Definition
Glucocorticoids are steroid hormones that are synthe-
Glomerulus sised from cholesterol by cytochrome P450 dependent
steroidhydroxylase, mainly in the adrenal cortex.
▶Mendelian Forms of Human Hypertension and
Definition Mechanisms of Disease
Glomerulus is the network of blood capillaries in the
cup-like end (Bowman’s capsule) of the nephron. It is
where waste products are filtered from the blood into
the kidney tubule.
▶Kidney
Glutamate

Glucocorticoid/Mineralocorticoid Definition
L-Glutamate is an excitatory amino acid neurotrans-
Receptors mitter. It influences almost all neurons in the brain.
Glutamatergic neurotransmission has been associated
functionally with a number of physiological and
Definition pathophysiological processes related to neuronal plas-
Glucocorticoid receptor (GR) and mineralocorticoid ticity and memory.
receptors (MR) are members of a nuclear receptor ▶Addiction, Molecular Biology
Glycohemoglobin 703

Glutathione Peroxidase Glycoconjugate

Definition Definition
GPx defines a family of homologous proteins that is Glycoconjugate refers to a compound that is composed
characterized by a catalytic triad composed of a of an oligosaccharide which is linked to a protein or
(seleno) cysteine, a glutamine and a tryptophan. These lipid.
enzymes reduce H2O2 and other ▶Biochemical Engineering of Glycoproteins
▶Free Radicals

Glycan Glycoform G

Definition Definition
Glycan is a general term for a polymer of mono- Glycoform describes various forms of a particular
saccharide units joined by glycosidic bonds. It may or species of glycoproteins that differ in the structures
may not have other components. and/or types of glycans.
▶Biochemical Engineering of Glycoproteins ▶Glycosylation of Proteins
▶Glycosylation of Proteins

Glycogen Synthase Kinase–3


Glycated Protein
Definition
Definition Glycogen synthase kinse–3 (GSK3) is a constitutively
Glycated protein designates a protein containing active kinase that undergoes inhibition by hormones
carbohydrate that was added by a nonenzymatic, and growth factors. One of its functions is to inhibit
chemical modification, usually through a Schiff-base Wnt/β-catenin signaling by phosphorylation of β-
reaction with the amino group of the side chain of catenin thereby targeting it for degradation. Also called
lysine, and subsequent Amadori rearrangement, to give zeste white 3/shaggy in Drosophila.
a stable conjugate. ▶Wnt/Beta-Catenin Signaling Pathway
▶Glycosylation of Proteins

Glycine Glycohemoglobin

Definition Definition
Glycine is an amino acid that is derived from dietary Glycohemoglobin stand for glycosylated hemoglobin.
sources, but is also generated endogenously from The ratio of glycohemoglobin and total hemoglobin is
glyoxylate. Glycine serves as an important inhibitory indicative of a person’s average blood glucose level
neurotransmitter, predominantly in the spinal cord, over the last months.
brain stem and retina. ▶Affinity Chromatography and In Vitro Binding
▶Peroxisomal Disorders (Beads)
704 Glycolysis

Glycolysis Glycosidic Linkage

Definition Definition
Glycolysis is the metabolic pathway that occurs in the Glycosidic linkage describes the linkage of a mono-
cytoplasm of cells, and by which glucose is broken saccharide to another residue via the anomeric hydro-
down to pyruvic acid. xyl group. The linkage generally results from the
▶Limb Girdle Muscular Dystrophies reaction of a hemiacetal with an alcohol (e.g. a
hydroxyl group on another monosaccharide or amino
acid) to form an acetal.
▶Glycosylation of Proteins

Glycoprotein
Glycosylase
Definition
Glycoprotein defines a protein with one or more Definition
carbohydrate moieties that are covalently bound to it. Glycosylase is an enzyme that catalyzes the cleavage of
▶Affinity Chromatography and In Vitro Binding an N-C1' glycosylic bond, which links a DNA base to
(Beads) the deoxyribosephosphate backbone of DNA.
▶Biochemical Engineering of Glycoproteins ▶Base Excision Repair
▶Glycosylation of Proteins
▶Protein Databases

Glycosylation of Proteins

Glycoproteomics R ICHARD D. C UMMINGS


University of Oklahoma Health Services Center,
Department of Biochemistry and Molecular Biology,
Definition Oklahoma City, Oklahoma, USA
richard-cummings@ouhsc.edu
Glycoproteomics refers to the science of defining the
structures of glycoproteins, and the sites of attachment
and structure of glycans to proteins.
▶Glycosylation of Proteins Definition
▶Glycoproteins are proteins that contain one or more
covalently attached carbohydrates, including ▶mono-
saccharides and ▶oligosaccharides. The attached car-
bohydrates are also termed ▶glycans. In typical
glycoproteins the glycans can contribute up to 20%
of the total weight. ▶Proteoglycans are a special class
Glycosaminoglycan of glycoproteins that contain at least one large-
sized (typically >5 kD), acidic polysaccharide (▶glyco-
saminoglycan) attached to protein. ▶Mucins are another
Definition special class of glycoproteins with a repeating peptide
Glycosaminoglycan refers to polysaccharide side- motif that usually contains multiple Ser and Thr residues
chains of proteoglycans or free complex polysacchar- to which relatively small-sized (usually <3 kD) glycans
ides that are composed of linear disaccharide repeating are attached. The mucin glycans can contribute more than
units, each composed of a hexosamine and a hexose or 25% of the total weight. The ▶storage polysaccharides
a hexuronic acid (heparin, heparan sulfate, chondroitin starch and glycogen, which contain glucose polysacchar-
sulfate, dermatan sulfate, and hyaluronan). ide linked to a core protein, also represent a special class
▶Glycosylation of Proteins of glycoprotein. In these cases the glycan portion
Glycosylation of Proteins 705

contributes more than 90% of the total weight. Another differences are often even greater in glycoproteins
type of glycoprotein is the glycosylphosphatidylinositol- between different organisms. Identifying the structures
anchored or ▶GPI-anchored glycoprotein. These contain of the glycans made by different organisms is known as
a C-terminal amino acid that is linked to ethanolamine, the field of glycomics and identifying sites of glycan
which is linked to the glycans of the GPI anchor. GPI- attachment in glycoproteins is known as the field of
anchored glycoproteins, which may also contain cova- ▶glycoproteomics.
lently attached N- and/or O-glycans at other residues
within the polypeptide, are usually anchored to plasma Characteristics
membranes of cells by insertion of the acyl chains of the Glycans on glycoproteins can vary in the types of
GPI moiety into the membrane outer leaflet. Thus, sugars that are attached, the ▶anomeric configuration
glycoproteins are found in many different sizes, ranging and structure of the attached sugars, the numbers of
from several thousand Daltons to millions of Daltons. attachments and the sites of attachments (2). The
The presence of carbohydrate on protein is a type of linkage of one monosaccharide to another is typically
▶post-translational modification, which along with via a ▶glycosidic linkage, characterized by the acetal
phosphorylation constitutes one of the most common structure. The part of the glycan linked to protein is
types of such modifications. The majority of the nearly termed the reducing end and the opposite, unattached G
30,000 proteins expressed in human cells are glyco- end(s) of the glycan is termed the non-reducing end or
proteins (1). The carbohydrates of glycoproteins are terminal region. Although there are many types of
added enzymatically to specific sites on a protein. This sugar-protein linkages, which are also glycosides, most
contrasts with what is seen for ▶glycated proteins in sugar-protein linkages are of two types, N-glycosides,
which carbohydrate addition occurs through the in which the amide of Asn (and in some organisms Arg)
chemical or non-enzymatic addition of a free mono- forms the linkage group (-C-N-C-) and O-glycosides,
saccharide, usually glucose or galactose, to amino in which the hydroxy group of hydroxyamino acids,
groups in proteins through formation of a Schiff base such as Ser, Thr, Tyr, hydroxylysine (Hyl) and
and rearrangement to a stable oxoamine adduct known hydroxyproline (Hyp), forms the linkage group (-C-
as an Amadori product. This process is termed O-C-). C-glycosides are an exception to this general-
glycation and is often seen in patients with diabetes ization, here a -C-N-C- bond links the sugar to an
and children with galactosemia. Animals, plants, fungi, amino acid, as seen in Man-C-Trp. Acetal or glycosidic
Protoctista, archaea and bacteria synthesize glycopro- linkages between sugars are stable to alkali. By
teins and many animal viruses contain glycoproteins. In contrast, the linkage of sugar to protein via Asn, Ser
animals, most of the proteins that are on cell surfaces or Thr residues is labile to relatively mild alkali. The
and those that are secreted by cells are glycosylated. sugar-protein linkages via Tyr, Hyl and Hyp are
Glycoproteins in membranes occur as integral or resistant to alkali. The glycosidic linkages within a
intrinsic membrane glycoproteins and may contain glycan and all glycosidic linkages of sugars to proteins
one or more transmembrane domains. GPI-anchored can be hydrolyzed by treatment with strong acid. An
glycoproteins are also considered to be integral exception to this generalization is the C-glycoside
membrane proteins. Glycoproteins can also occur as linkage, which cannot be hydrolyzed by treatment with
extrinsic membrane glycoproteins that associate with either alkali or acid.
the membrane through other mechanisms. Many Table 1 lists some of the common sugar-protein
proteins within the cytosol of eukaryotes are also linkages found in glycoproteins from animals, plants,
glycoproteins. In animal cells two of the major classes fungi, archaea and bacteria. Many dozens of linkages
of intracellular glycoproteins are the storage polysac- are now known, but mammalian cells appear to
charides and those that contain O-linked GlcNAc or generate about a dozen or so. In most cases the
O-GlcNAc (termed O-GlcNAcylated) modifications. monosaccharide residues shown in Table 1 are
The attached glycans in different glycoprotein species extended by the addition of other sugar residues. For
often exhibit tremendous diversity in size and structure. examples see the composite animal cell glycoproteins
Within a single glycoprotein species these structural shown in Fig. 1. The typical mammalian glycoprotein
differences are often denoted by the term microheter- may contain N- and/or O-glycans of the types shown.
ogeneity and the varied forms of a single glycoprotein One exception to the generalization that glycans are
species are termed ▶glycoforms. Different glycopro- extended by addition of other sugars is found in
teins from the same cell may be glycosylated very eukaryotic cytosolic proteins that contain O-linked
differently, depending on the primary, secondary, GlcNAc (GlcNAcβ-O-Ser/Thr), where this residue is
tertiary and/or quaternary structure, association with not further modified. It is also one of the few examples
other proteins and subcellular localization. However, where the sugar addition is reversible, i.e. the GlcNAc
glycosylation differences are greater among glycopro- residue is removed and added back multiple times on a
teins from different cell types within an organism; the mature protein. This probably serves an important
706 Glycosylation of Proteins

Glycosylation of Proteins. Table 1 Examples of Sugar-Protein Linkages in Glycoproteins from Different


Organisms

Kingdom Sugar-Protein Lionkage


Animalia GlcNAcβ-N-Asn*
GalNAcα-O-Ser/Thr
GlcNAcβ-O-Ser/Thr
Manα-O-Ser/Thr
Xylβ-O-Ser
Fucα-O-Ser/Thr
Glcβ-O-Ser/Thr
Glcα-O-Tyr (in glycogen)
Galβ-Hyl
Man-C-Trp
Plantae GlcNAcβ-N-Asn
Galα-O-Hyp
Galα-O-Ser
Arafβ-O-Hyp
Glcβ-Arg (in starch)
Fungi GlcNAcβ-N-Asn
Manα-O-Ser/Thr
Protoctista (algae, sea-weeds and protozoa) GlcNAcβ-N-Asn
Arafβ-O-Hyp
GlcNAcα-O-PO-3-Ser
GlcNAcα-O-Ser/Thr
GlcNAcα-Hyp
Archaea Glcβ-N-Asn
GalNAcβ-N-Asn
Rha-N-Asn
Gal-O-Thr
Bacteria (Eubacteria) Galβ-O-Tyr
Galβ-O-Ser/Thr
GalNAcβ-O-Ser/Thr
Glcα-O-Ser

*
Abbreviations: GlcNAc, N-acetylglucosamine; Asn, Asparagine; GalNAc, N-acetylgalactosamine; Ser, Serine; Thr, Threonine;
Man, Mannose; Xyl, Xylose; Fuc, Fucose; Glc, Glucose; Tyr, Tyrosine; Gal, Galactose; Hyl, Hydroxylysine; Trp, Tryptophan; Hyp,
Hydroxyproline; Ara, Arabinose; Arg, Arginine; Rha, Rhamnose; Pro, Proline; Cys, Cysteine; Gly, Glycine; Xaa, any amino acid,
except as indicated

regulatory function for cytosolic glycoproteins, akin glycosylation is found in glucosylation of N-glycans
to the action of protein phosphorylation, which is as part of the ▶quality control system for glycoprotein
also reversible. (The other example of reversible folding, as discussed below.)
Glycosylation of Proteins 707

Glycosylation of Proteins. Figure 1 Examples of different types of protein glycosylation. Shown are examples
of composite membrane glycoproteins in animals that may contain one or more O- or N-glycans and a GPI-anchor
or a transmembrane domain. Glycoproteins in the cytosol may also contain O-GlcNAc residues. The key
for the symbols and abbreviations of monosaccharides is indicated and used in other figures.

N-glycans in higher animals typically contain 7–20 but not plant, glycoproteins. Some bacteria synthesize
monosaccharide residues, whereas O-glycans typically sialic acid, however it has not yet been identified on
have 2–10 residues. However, in yeast, many N- bacterial glycoproteins. Some glycoproteins from
glycans contain mannan, a polysaccharide of mannose, nematodes contain tyvelose (3,6-dideoxy-D-arabino-
which can contain hundreds of mannose residues. In hexose - Tyv), which is not found in vertebrate
proteoglycans the attached glycosaminoglycans can be glycoproteins. Some of these monosaccharide residues
hundreds of residues in length. Some examples of may themselves be further modified before or after
common types of N- and O-glycans in mammals are incorporation into the glycan moiety of the glycopro-
shown in Fig. 2. Mammalian glycoproteins are largely tein to provide even more diversity of structure, as seen
composed of the ten sugars or building blocks shown, for NeuAc, which may be O- and/or N-acetylated at
which are the ▶hexoses galactose (Gal), glucose (Glc) various positions, GlcA and IdoA, which can be N-
and mannose (Man), the ▶deoxyhexose fucose (Fuc), sulfated and O-sulfated at various positions, and
the ▶hexosamines N-acetylglucosamine (GlcNAc) and GlcNAc and Gal, which may be O-sulfated. Thus, the
N-acetylgalactosamine (GalNAc), the ▶uronic acids number of possible glycan structures is astronomically
glucuronic acid (GlcA) and iduronic acid (IdoA), the large and the upper limit of the number of structures is
pentose xylose (Xyl) and the 9-carbon carboxylated not known for any organism. It is important to note that
amino sugar ▶sialic acid (Sia) and its multiple polypeptides are typified by a linear structure in which
derivatives. In humans Sia occurs primarily as N- two amino acids (L-amino acids in animals) are linked
acetylneuraminic acid (NeuAc). Other organisms use by a peptide bond. By contrast, glycans in glycopro-
many of these same monosaccharide residues, but also teins are usually branched structures, where the
use novel ones not found in animals. For example, both monosaccharides are linked to each in multiple ways,
Gal and GlcNAc are commonly found in glycoproteins such as different anomeric configuration (α versus β) to
from all the known kingdoms. Rhamnose (Rha) and various positions on a residue (position C-2, C-3, C-6,
arabinose (Ara) are found in plant, but not animal, etc.). In addition, the sugars in glycoproteins can be
glycoproteins, whereas sialic acids are found in animal, in either pyranose (6-membered ring) or furanose
708 Glycosylation of Proteins

Glycosylation of Proteins. Figure 2 Examples of different types of N- and O-glycans. Animal cell N-glycans shown
on the left side have a common pentasaccharide core (shown in the boxed structure), which is composed of a
trimannosyl sequence linked to a chitobiosyl disaccharide, which is in turn linked to an Asn residue through N-
glycosidic linkage. The N-glycans are generally classified as high mannose-, hybrid- or complex-type as shown. The
complex-type N-glycans may have multiple branches or antennae, described as mono-, bi-, tri-, or tetra-antennary,
etc. Animal cell mucin-type O-glycans shown on the right side have a common structure of GalNAc linked to either
Ser or Thr. This GalNAc residue may be modified in various ways to generate a variety of core structures (shown in
the boxed structures).

(5-membered ring) structures and the residues may be The attachment of N-glycans to Asn residues of
either or D- or L-enantiomers (mirror images). Con- secreted and membrane-bound glycoproteins occurs
sidering all these possibilities, it is easy to see that two within a ▶consensus sequence or sequon –Asn-X-Ser/
identical amino acids linked together in a protein give a Thr- (or Cys) (Table 2), although not all Asn residues
single dipeptide structure, whereas two identical within the sequon of such glycoproteins are always
hexoses may be linked together to give 64 possible used. Asn residues outside this sequon are not N-
isomeric disaccharide structures. If two different glycosylated. In addition, cytoplasmic proteins with the
hexoses are linked together it is possible to obtain N-glycosylation sequon are not N-glycosylated, be-
128 different isomers (3). cause the pathway of N-glycosylation occurs within the
The N-glycans, also called Asn- or ▶N-linked lumen of the ▶endoplasmic reticulum (ER), as
oligosaccharides, in animals, plants, fungi and protista discussed below. Although the N-glycosylation sequon
contain the common trimannosyl core structure that is is the most well known glycosylation sequon, a few
linked via a chitobiosyl core (-GlcNAc-GlcNAc-) to other sugar-amino acid linkages also occur in definable
Asn, as highlighted in Fig. 2. There are various types of sequences of proteins, as seen for addition of O-Glc, O-
N-glycans in animal cells, distinguished by the outer or Fuc, C-Man and O-Gal (collagen) (Table 2). For most
terminal sugar structure, as seen for ▶high mannose- other attachments of sugars to proteins however, there
type, hybrid-type and ▶complex-type sequences. The are no clearly predictable consensus sequences,
O-glycans in mucins of animal cells, which are also although the probability of attachment of some sugars
called Ser/Thr- or O-linked oligosaccharides, are to protein, such as GalNAc or GlcNAc to Ser/Thr
characterized by the linkage to Ser/Thr residues via residues in mucins and in cytosolic glycoproteins,
GalNAc, as shown in Fig. 2. The GalNAc residue appears to be enhanced by clusters of Ser and/or Thr
may be modified in different ways by linkage to other residues and nearby amino acids (e.g. Pro) (Table 2).
sugars to give various core structures. Altogether there Some mathematical algorithms have been deve-
are at least 8 different core structures in mucin-type loped based on this information to predict sites of
O-glycans of animals. Some of the more common ones addition of GalNAc to Ser/Thr residues in animal
are highlighted in Fig. 2 as cores 1–4. mucins.
Glycosylation of Proteins 709

Glycosylation of Proteins. Table 2 Some Protein Consensus Sequences for Sugar Addition
*
Sugar-Protein Linkage Consensus Sequence
GlcNAcβ-N-Asn -Asn-Xaa-Ser/Thr- (where Xaa ≠ Pro)#
-Asn-Xaa-Cys- (where Xaa ≠ Pro) rare (animals, plants, fungi, Protoctista)
Fucα-O-Ser/Thr -Cys-Xaa-Xaa-Gly-Gly-Ser/Thr-Cys- (animals)
Glcβ-O-Ser/Thr -Cys-Xaa-Ser-Xaa-Pro-Cys- (animals)
Galβ-Hyl -Gly-Xaa-Hyl-Gly- (animal collagen)
Man-C-Trp -Trp-Xaa-Xaa-Trp- (animals)
Xylβ-O-Ser -Ser-Gly- (or Ala) (indefinite) (animals)
GalNAcα-O-Ser/Thr clustered Ser/Thr near Pro (indefinite) (animals)

*Amino acids to which sugars are linked are in bold G


#
Abbreviations: See Table 1

Glycoprotein Biosynthesis UDPGlcNAc to generate GlcNAc-P-P-dol (4). This


N-Glycan Biosynthesis step in the biosynthesis of N-glycans is blocked by the
In animal cells glycoprotein biosynthesis occurs in naturally occurring inhibitor ▶tunicamycin, a transi-
several cellular compartments. The primary sites for tion state analog of UDP-GlcNAc, which was origin-
N-glycan biosynthesis are the ER and ▶Golgi ally identified in the fungus-like soil bacterium
apparatus. N-glycans are generated by a unique Streptomyces lysosuperificus. (The name tunicamycin
pathway, involving the addition of a preformed glycan derives from its discovery as an antiviral agent that
to Asn residues in the consensus sequence –Asn-X-Ser/ blocked viral coat (tunica) formation.) Treatment of
Thr- in nascent or forming polypeptides during their animal cells with tunicamycin blocks N-glycosylation
translation (▶co-translational modification) through of proteins by blocking formation of the precursor dol-
the translocon Sec61p, the pore-forming protein in P-P-oligosaccharide) and results in cell death, due to
the ER associated with ribosomes on the cytoplasmic the inability to synthesize and correctly fold glycopro-
face of the ER. The preformed glycan added to newly teins, as discussed below. Interestingly, the synthesis of
synthesized glycoproteins occurs as a lipid-linked GlcNAc-P-P-dol and several other steps beyond this
donor ▶dolichol-pyrophospho-oligosaccharide (dol- occurs on the cytoplasmic side of the ER. Following
P-P-oligosaccharide), which in vertebrates contains this first step to synthesize GlcNAc-P-P-dol, additional
14 monosaccharide residues in the formula Glc3Man9- GlcNAc and mannose (Man) residues are added
GlcNAc2-dolichol (Fig. 3). This large oligosaccharide stepwise from their respective sugar nucleotide donors
can be eventually converted, as described below, to the (Fig. 3). After the formation of Man5GlcNAc2-P-P-dol,
high-mannose-, hybrid- and complex-type N-glycans which faces the cytoplasm, the Man5GlcNAc2-P-P-dol
discussed above. The enzyme that transfers this is “flipped” across the membrane of the ER by an
preformed glycan is called the ▶oligosaccharyltrans- unknown mechanism so that it now faces the lumen or
ferase (OST) and occurs in all eukaryotes as a complex, inner region of the organelle. This Man5GlcNAc2-P-P-
hetero-oligomeric enzyme associated with the ER dol is then further elongated in the lumen of the ER by
membrane. Dolichol (dol) is a polyisoprenoid alcohol donation of Man residues from the dolichol inter-
(prenol) whose general formula can be seen from mediate dol-P-Man (Fig. 3). In vertebrate cells,
the structure of dolichol-phosphate (P-dol) (Fig. 3). following completion of the mannose addition to give
Dolichol contains 75–95 carbons and is one of the Man9GlcNAc2-P-P-dol, 3 glucose residues are added by
largest, and most unusual lipids found in animals. It is the intermediate donor dol-P-Glc, to generate the final
synthesized from the same initial precursors and using product Glc3Man9GlcNAc2-dol. Each step from the
the same early enzymatic steps that are used to generate formation of GlcNAc-P-P-dol to the formation of
sterols, such as cholesterol. Glc3Man9GlcNAc2-P-P-dol is catalyzed by a distinct
The synthesis of the dol-P-P-oligosaccharide also enzyme. Mutations in any steps in the pathway usually
occurs in the ER and is initiated by the addition of N- result in an inability to add other sugars, thus truncated
acetylglucosamine from the sugar nucleotide donor dol-P-P-oligosaccharides are generated, which are often
710 Glycosylation of Proteins

Glc3Man5GlcNAc2 instead of Glc3Man9GlcNAc2 to


proteins. Such mutant cell lines, which are viable
in vitro, helped to elucidate one of the key intermediate
steps in this complex pathway of N-glycan biosynthesis.
The addition of Glc3Man9GlcNAc2 to proteins occurs
on the nascent polypeptides and multiple Glc3Man9-
GlcNAc2 residues can be added to Asn residues within
the N-glycosylation sequon as they emerge translation-
ally into the ER through the translocon Sec61p (Fig. 4).
But protein folding can also begin with polypeptide
intermediates and such folding may interfere with
glycan addition to Asn residues. Thus, while Asn
residues in some N-glycosylation sequons are quanti-
tatively and efficiently N-glycosylated, Asn residues
within other sequons may be only partly or inefficiently
N-glycosylated. This partial glycosylation at some
Asn residues may not be accidental however, and may
be under metabolic control and help to regulate
glycoprotein function. Following the addition of Glc3-
Man9GlcNAc2 to protein, the glucose residues are
removed sequentially from nascent polypeptides and
completely translated glycoproteins by two different α-
glucosidases (I and II) in the ER to generate Man9-
GlcNAc2-Asn (Fig. 4). Glucosidases are examples
of enzymes termed glycosidases, which are able
to cleave glycosidic linkages. The actions of the
α-glucosidases are the first steps in glycoprotein
biosynthesis termed processing, where specific sugars
are removed from a newly synthesized glycoprotein in
an orderly fashion. These α-glucosidases are inhibited
by some sugar analogs, such as australine, which
inhibits α-glucosidase I and castanospermine and
1-deoxynorijirimycin, which inhibit both α-glucosi-
dases I and II.
Following removal of the three Glc residues, a single
glucose residue can be added back to Man9GlcNAc2-
Asn to generate Glc1Man9GlcNAc2-Asn (Fig. 4). This
re-glucosylation is catalyzed by the enzyme UDPGlc:
glycoprotein glucosyltransferase (UGGT), a protein
found in the ER (5). UGGT adds Glc from the sugar
nucleotide donor UDPGlc, and generates a UDP
Glycosylation of Proteins. Figure 3 Biosynthesis of byproduct. The action of UGGT is part of a quality
dolichol-P-P-oligosaccharide in higher animals. The control process in which glycoprotein folding in the ER
pathway shown occurs in the cytosolic and lumenal
is regulated. Improperly folded proteins either fail to
regions of the endoplasmic reticulum (ER). A separate
enzyme catalyzes each step in the pathway shown. The
leave the ER and are degraded there or leave to be
mechanism by which the intermediate dolichol-P-P- degraded by the ▶proteasome machinery located in the
oligosaccharide is reoriented in the ER membrane by cytoplasm. Mature proteins have a specific shape that
“flip-flop” is not yet understood. results from folding the polypeptide backbone and
chemical cross-linking between Cys residues to form
disulfide bonds. While protein folding is spontaneous,
it is relatively slow and inefficient. Protein folding
not efficiently utilized by the OST. The blockage in the during biosynthesis is rapid and this is usually achieved
addition of Dol-P-man, because of an inability to through the assistance of ▶molecular chaperones.
synthesize it, was first identified in a cultured mamma- Chaperones are proteins that assist other proteins in
lian cell line and resulted in the accumulation of acquiring their mature and active forms. This is often
Man5GlcNAc2-P-P-dol, leading to the addition of associated with proper protein folding and prevention of
Glycosylation of Proteins 711

Glycosylation of Proteins. Figure 4 Biosynthesis of N-glycans in glycoproteins in higher animals. The pathway
shown occurs in the lumenal regions of the secretory organelles the ER and the Golgi apparatus. The latter is
generally partitioned into separate regions known as cis, medial, and trans. Lysosomal acid hydrolases may acquire
Man-6-P residues on their N-glycans, which are recognized by the Man-6-P receptors, which can deliver the
lysosomal enzymes to late endosomes. Following the completion of N-glycan structures, glycoproteins may stay in
the secretory organelles and endosomes as either soluble or membrane-bound glycoproteins, be secreted into the
extracellular space or be incorporated into plasma membrane.

the undesirable protein oligomerization that can occur in generally defined as non-immune proteins that recog-
the concentrated protein environment of the ER. nize and bind to specific glycan structures without
Glycoproteins that are partly or improperly folded are catalyzing a chemical modification. Calnexin/calreti-
transported back into the cytoplasm by a process culin bind in a reversible manner and their binding is
termed retrograde transport, which appears to occur probably associated with binding of other chaperones
through translocon Sec61p. UGGT binds its acceptor that recognize specific peptide features of the protein.
substrate Man9GlcNAc2-Asn with high affinity only on Once released from calnexin/calreticulin, the Glc1-
partly unfolded proteins, and thus has the potential for Man9GlcNAc2-Asn is subject to action of α-glucosi-
dual recognition of protein and glycan determinants. dase II, resulting in reformation of Man9GlcNAc2-Asn.
The Glc1Man9GlcNAc2-Asn generated by this enzyme If a glycoprotein is still not properly folded, the UGGT
is recognized by two different ER lumenal molecular adds back Glc to regenerate Glc1Man9GlcNAc2-Asn.
chaperones, ▶calnexin and ▶calreticulin. Calnexin and UGGT participates in quality control of N-glycan
calreticulin are examples of animal lectins, which are biosynthesis and protein folding through this cycle of
712 Glycosylation of Proteins

glucosylation/deglucosylation, which is repeated until Lys residues. The phosphotransferase recognizes the
a glycoprotein assumes a conformation that blocks its signal patch and adds GlcNAc-1-P from the donor
interaction with UGGT. Blocking the action of α- UDPGlcNAc to nearby Man residues on the high
glucosidase with castanospermine or other glucosidase mannose-type N-glycans to generate the phosphodie-
inhibitors can result in glycoprotein accumulation in ster GlcNAc-1-P-6-Man5-8GlcNAc2-Asn. Following
the ER, due to inefficient protein folding. This single formation of GlcNAc-1-P-6-Man5-8GlcNAc2-Asn on
Glc residue serves as a type of ER-retention signal, lysosomal enzymes, they are subjected to the action
preventing glycoprotein exit from the ER. of the α-N-acetylglucosamine-1-P phosphodiesterase
Following the formation of Man9GlcNAc2-Asn on a (“uncovering” enzyme or UCE), which removes the α-
folded glycoprotein, the ER α-mannosidase removes linked GlcNAc residue, resulting in formation of the
one of the mannose residues of Man9GlcNAc2-Asn to phosphomonoester structure P-6-Man5-8GlcNAc2-Asn.
form Man8GlcNAc2-Asn (Fig. 4). Interestingly, some Thus, lysosomal enzymes acquire one or more Man-6-
cells contain an endomannosidase that can remove Glc- phosphate- (Man-6-P) phosphomonoester residues on
Man disaccharide from Glc1Man9GlcNAc2-Asn in an high mannose-type N-glycans. The presence of Man-6-
α-glucosidase II-independent pathway to form an P blocks action of α-mannosidases on the specifically
alternative Man8GlcNAc2-Asn. The endomannosidase phosphorylated Man residues. Thus, as described
can also act on Glc1-3Man9GlcNAc2-Asn derivatives. below, phosphorylated glycans cannot be converted
Following formation of Man8GlcNAc2-Asn on ER to complex-type N-glycans, although they can be
glycoproteins, they usually exit to the Golgi appar- converted to Man-6-P-containing hybrid-type N-gly-
atus by vesicular transport involving COP-coated cans.
vesicles. The Golgi apparatus is recognized as a multi- It is important to note that sugar nucleotides, which are
compartment organelle, with cis, medial and trans important donors for glycosyltransferases in the lumen
compartments and a terminal compartment called the of the ER and Golgi apparatus, are synthesized in the
transGolgi network (TGN). The Golgi apparatus is cytosol. The sugar nucleotides are imported into the
usually positioned in the cell so that the cis-Golgi lumen of the ER and Golgi apparatus by specific
is nearest or proximal to the ER and the trans-Golgi is transporters. Most of these transporters function as
away from or distal to the ER. antiporters; they move a sugar nucleotide into the
Upon reaching the cis-Golgi the Man8GlcNAc2-Asn lumen and the cognate nucleoside monophosphate into
in glycoproteins is subjected to further processing by the cytosol for reutilization. Defects in transporter
α-mannosidase I (Fig. 4). This enzyme removes 3 function are associated with human genetic diseases, as
additional α-linked mannose residues from Man8- discussed below.
GlcNAc2-Asn to generate Man5GlcNAc2-Asn. Not all The recognition of lysosomal hydrolases by the
the high-mannose N-glycans, however, are susceptible phosphotransferase is dependent on the folded structure
to α-mannosidase I. The Man8GlcNAc2-Asn in some of lysosomal enzymes and unfolded proteins are not
glycoproteins are not very accessible to α-mannosidase recognized by the phosphotransferase (6). As discussed
I, leading to formation of mature glycoproteins having below, mutations in the phosphotransferase can result
Man8GlcNAc2-Asn or partly processed forms such as in lack of addition of GlcNAc-1-P to the more than 50
Man7-, Man6-, or Man5GlcNAc2-Asn. The action of α- lysosomal enzymes. Alternatively, the phosphotrans-
mannosidase-I can be inhibited by several drugs, ferase can stochastically fail to add the GlcNAc-1-P to
including the plant alkaloid kifunensine and the a fraction of the lysosomal acid hydrolases. Such non-
mannose derivative 1-deoxymannojirimycin. phosphorylated glycans of lysosomal acid hydrolases
Acid hydrolases that are destined to enter lysosomes are subject to further processing and modification in the
are subject to the action of an alternative pathway in Golgi apparatus and can acquire complex-type N-
which they acquire phosphorylated mannose residues glycan structures that contain sialic acid and other
(Man-6-P) that are recognized by the mannose-6- terminal sugars.
phosphate (Man-6-P) receptors (Fig. 4). These recep- The Man5GlcNAc2-Asn structures are potential accep-
tors help to deliver lysosomal enzymes to endosomes, tors for the enzyme N-acetylglucosaminyltransferase
from which the ▶lysosomal acid hydrolases can enter I (GNT-I), which adds GlcNAc from the donor
mature lysosomes and the Man-6-P receptors recycle to UDPGlcNAc to form GlcNAcMan5GlcNAc2-Asn
the Golgi for additional rounds of delivery (6). The (Fig. 4). This is a hybrid-type N-glycan, that contains
high mannose-type N-glycans of lysosomal acid non-reducing terminal Man residues and other terminal
hydrolases in the early Golgi apparatus are recognized sugars, such as GlcNAc. In vertebrate cells the product
by the UDPGlcNAc:lysosomal enzyme phosphotrans- GlcNAcMan5GlcNAc2-Asn is usually acted upon by
ferase. Many of the lysosomal acid hydrolases have α-mannosidase II, which specifically recognizes
unique 3-dimensional structures that generate a surface GlcNAcMan5GlcNAc2-Asn and removes two mannose
patch, which is a basic region that includes residues to form GlcNAcMan3GlcNAc2-Asn. The
Glycosylation of Proteins 713

concerted action of α-mannosidases and GNT-I, which and the availability of the glycoprotein glycans to the
appears to occur largely in the cis-Golgi region, enzymes. Following these modifications of the N-
generates the trimannosyl structure common to all glycans in the Golgi apparatus, secretory glycoproteins
complex-type N-glycans (7). This GlcNAcMan3- are released into the extracellular space by secretory
GlcNAc2-Asn is usually acted upon by N-acetylgluco- vesicles, while membrane-bound glycoproteins may be
saminyltransferase II, which adds GlcNAc from targeted to the plasma membrane or lysosomes.
UDPGlcNAc to form the product GlcNAc2Man3-
GlcNAc2-Asn. This is an example of a biantennary GPI-Anchor Biosynthesis
complex-type N-glycan, which is characterized by the Many glycoproteins in eukaryotes contain a novel C-
lack of non-reducing terminal Man residues and the terminal modification of a glycosylphosphatidylinosi-
presence of other terminal sugars. The biantennary tol lipid anchor, the GPI anchor (9). The addition of the
nature refers to the presence of two branches of the GPI-anchor to proteins occurs in the ER and involves
complex-type N-glycan. But within the cis-Golgi recognition of a C-terminal domain of newly synthe-
additional N-acetylglucosaminyltransferases (GNT-III sized protein (Fig. 5). A preformed lipid-linked
through VI) can add additional GlcNAc residues to the precursor is generated from phosphatidylinositol by
Man residues to form bisected N-glycans, or multi- initial reactions in the cytosolic face of the ER. An G
antennary N-glycans, such as tri-, tetra-, penta- and intermediate containing glucosamine is then reoriented
hexa-antennary structures. Within the more distal (“flip-flop”) by translocation across the ER membrane
trans-Golgi apparatus, the GlcNAc2Man3GlcNAc2- to allow further elongation of the precursor by addition
Asn is subjected to modification by galactosyltrans- of Man residues from dol-P-Man. Ethanolamine
ferases, causing addition of Gal residues to GlcNAc phosphate is added to Man residues by donation from
residues from the donor UDPGal to form Galβ4Glc- phosphatidylethanolamine to generate a GPI precursor.
NAc-R sequences. This disaccharide terminal sequence In all organisms this GPI precursor is characterized by
is termed N-acetyllactosamine (LN). In vertebrates the having glucosamine linked to inositol and the triman-
pituitary glycoprotein hormones, such as lactating nosyl sequence linked to ethanolamine in a “core”
hormone and follicle stimulating hormone, acquire structure. But this core structure may be differentially
biantennary N-glycans but are subject to addition of modified in a tremendous variety of ways, depending
GalNAc residues from the donor UDPGalNAc to form on the organism, by addition of other sugars, e.g. Man
the sequence GalNAcβ4GlcNAc-R; this terminal dis- and Gal residues, additional ethanolamine residues,
accharide is termed lactosamine-di-N-acetyl (Lacdi- addition of phosphate, fatty acylation of the sugars and
NAc or LDN). This formation of LDN sequences on further acylation of inositol.
pituitary glycoprotein hormones requires the action The GPI precursor is the substrate for a transamidation
of a specific N-acetylgalactosaminyltransferase that reaction by the GPI transamidase complex. This
appears to recognize primary sequences within the enzyme complex recognizes the C-terminal lipophilic
hormones, and does not generally act on other portion of some proteins with a GPI anchor sequence,
glycoproteins within the pituitary (8). But less specific causing the cleavage of the polypeptide bond and
N-acetylgalactosaminyltransferases are also expressed transfer of the GPI precursor to the new C-terminal
in other cells to generate the LDN structure on non- amino acid. The GPI transamidase forms a carbonyl
pituitary glycoprotein hormones. The LDN termini intermediate with the substrate protein. The signal
of pituitary glycoprotein hormones are sulfated at the sequence for GPI anchor addition is a C-terminal
C-3 position of the terminal GalNAc residues by a region with an amino acid to which the anchor is
PAPS:GalNAc 3-O-sulfotransferase to form S-3-GalNAc eventually attached that is termed the ω site. The amino
moieties. The resultant formation of (S-3-GalNAc)2 acids that are two residues to the carboxyl side of ω
GlcNAc2Man3GlcNAc2-Asn on pituitary glycoprotein residue (the ω + 2 site) have small side chains, whereas
hormones promotes their recognition and clearance the residues at the ω + 1 site can have large side chains.
from the blood circulation by a liver receptor for In all cases the ω + 2 site is followed by a stretch of
S-3-GalNAc. 5–10 hydrophilic amino acids and then 15–20 hydro-
In most glycoproteins following the addition of Gal phobic residues at or very near the carboxyl or
residues to GlcNAc residues, the complex-type N- C-terminus of the protein. Following the addition of
glycans can acquire other modifications in the trans- the GPI anchor, the GPI-anchored glycoproteins move
Golgi and TGN. These include addition of sialic acid to the plasma membrane. These glycoproteins usually
from CMPNeuAc, fucose from GDPFuc, and other have other sugar residues attached to other amino acids,
residues. Each addition or modification is catalyzed by such as N-glycans, that may or may not be processed
a separate enzyme. A tremendous variety of modifica- within the ER and Golgi apparatus. It is interesting that
tions are possible depending on a wide variety of the formation of N-glycans and GPI-anchored glyco-
factors, such as expression of the modifying enzymes proteins uses a common intermediate, i.e. dol-P-Man.
714 Glycosylation of Proteins

Glycosylation of Proteins. Figure 5 Biosynthesis of GPI-anchored glycoproteins. The pathway shown is for
human GPI anchor biosynthesis from phosphatidylinositol, which occurs in the cytosolic and lumenal regions of the
ER. Following generating of the GPI anchor precursor, the GPI anchor is added en bloc to the C-terminal region of an
ER protein by the transamidase complex, resulting in the cleavage and release of a C-terminal peptide.
Glycosylation of Proteins 715

Some individuals have a mutation in the gene (termed N-acetylglucosaminyltransferase (the core 2 GlcNAcT)
PIG A) encoding the first enzyme of the pathway for to generate the trisaccharide Galβ3(GlcNAcβ6)GalNA-
GPI anchor biosynthesis that normally adds GlcNAc cα1-Ser/Thr (core 2 O-glycan) from the donor UDPGlc-
from UDPGlcNAc to phosphatidylinositol. Thus, these NAc. This core 2 O-glycan can be subsequently modified
individuals are defective in generating the mature GPI by addition of other sugars, such as galactose, fucose and
anchor precursor and are deficient in generating GPI- N-acetylneuraminic acid, and/or sulfate residues on
anchored glycoproteins. Such individuals are often selected sugars to generate a wide variety of O-glycan
clinically recognized as having paroxysmal nocturnal structures.
hemoglobinuria (PNH), a form of hemolytic anemia.
GPI-anchored glycoproteins also occur in many Biosynthesis of Other O-Glycans
protozoans, and have been especially well character- The biosynthesis of non-mucin type O-glycans is
ized in African trypanosomes, where the GPI-anchored incredibly varied depending on the cellular compart-
glycoprotein is recognized as a highly antigenic variant ment. O-GlcNAc residues are added to proteins in the
surface glycoprotein (VSG). cytoplasm, as discussed below. Glycosaminoglycan
addition is initiated by UDPXyl:core protein β-D-
Mucin-Type O-glycan Biosynthesis xylosyltransferases I and II, which transfer Xyl from G
Many glycoproteins within the Golgi apparatus are UDPXyl to specific Ser residues in proteoglycan
modified to contain GalNAcα1-Ser/Thr residues, core proteins in the ER. The Xyl residue is subse-
typically found in animal mucins, by the action of a quently modified by addition of Gal and GlcA residues
family of UDPGalNAc:polypeptide α-N-acetylgalacto- by galactosyltransferases and glucuronyltransferases
saminyltransferases (ppGalNAcTs). While mucins may respectively, to form the core linkage tetrasaccharide of
contain hundreds of such linkages, some glycoproteins, glycosaminoglycans, GlcAβ1-3Galβ1-3Galβ1-4Xylβ1-
such as the transferrin receptor, contain only a single O- Ser, which occurs in all proteoglycans. This synthesis
glycan. Yet, all such linkages are categorized as mucin- of the glycosaminoglycan core region may be com-
type. The ppGalNAcTs recognize Ser and Thr residues pleted in the ER, while the subsequent elongation of the
in glycoproteins and add GalNAc in O-glycosidic glycans and sulfation and epimerization, which is an
linkage from the donor UDPGalNAc to these amino orchestrated and incredibly complex series of reactions,
acid side chains to form GalNAcα1-Ser/Thr, which is appear to occur primarily in the Golgi apparatus. The
also called the Tn antigen (10). Well over a dozen elongation of glycosaminoglycans on proteoglycans
different ppGalNAcTs are known and many of these can be partly averted by feeding cells β-xylosides, that
are expressed simultaneously within cells. Some of act as acceptors for addition of Gal, thus effectively
these enzymes may have unique, but partly over- decreasing elongation of glycosaminoglycan within the
lapping, recognition of Ser/Thr residues within the proteoglycan acceptors. Remarkably, β-xylosides ap-
polypeptide sequence. Interestingly, many ppGalNAcTs pear capable of penetrating the ER and possibly the
are dual function enzymes containing a catalytic Golgi apparatus of cells. This inhibition by competition
domain that transfers GalNAc and a lectin domain can result in synthesis of free glycosaminoglycans on
(ricin- or R-type) that binds to GalNAc residues. the β-xyloside and reduced addition of glycosamino-
Thus, addition of GalNAc to some Ser/Thr sites may glycan to proteoglycans. O-Mannosylation of proteins
promote further modification by attracting more in yeast in initiated in the ER by transfer of Man from
ppGalNAcTs. Such concerted actions of ppGalNAcTs the donor dol-P-Man using a specific O-mannosyl-
in the Golgi apparatus may promote the relatively transferase. Further elongation to generate mannose-
efficient modifications of hundreds of Ser/Thr containing polysaccharides in yeast occurs by Man
residues within some very large mucin polypeptides, donation from GDPMan in the Golgi apparatus by
some of which have over 10,000 amino acids. additional mannosyltransferases. An equivalent en-
Following the formation of GalNAcα1-Ser/Thr residues, zyme in animals, termed POMT1, may initiate O-
glycoproteins are subjected to the action of a β3- Man formation on selective Ser/Thr residues in
galactosyltransferase, also called the T-synthase, to form glycoproteins in the ER using dol-P-Man as the donor,
the disaccharide Galβ3GalNAcα1-Ser/Thr, which is while further elongation and addition of other sugars
called the Thomsen-Friedenrich, TF or simply T antigen, may occur in the Golgi apparatus. O-fucosylation and
using the donor UDPGal. The T antigen disaccharide is O-glucosylation of EGF-like domains on glycoproteins
also the simplest core 1 O-glycan structure (Fig. 2). are catalyzed by specific enzymes that transfer Fuc or
However, occasionally the GalNAcα1-Ser/Thr residues Glc from GDPFuc or UDPGlc, respectively, in the
may be sialylated to generate the disaccharide NeuA- Golgi apparatus. Collagen is glycosylated in the ER
cα6GalNAcα1-Ser/Thr (sialyl Tn antigen), which cannot following hydroxylation of Lys residues to generate
be further modified. Upon formation of Galβ3GalNA- hydroxylysine (Hyl). The addition of Gal to Hyl is
cα1-Ser/Thr, the core 1 structure may be modified by an catalyzed by a collagen-specific enzyme UDPGal:
716 Glycosylation of Proteins

procollagen-5-hydroxy-L-lysine D-galactosyltransfer- plant wounding and pathogen attack. The pistil and
ase, which adds Gal to Hyl residues on procollagen in pollen tube extracellular matrix are enriched in these
the ER during procollagen biosynthesis and concomi- highly glycosylated proteins.
tantly with Hyl formation on nascent polypeptides
catalyzed by lysyl hydroxylase activity. Bacterial Glycoproteins
Glycoproteins are also found in prokaryotes and in
Glycosylation in the Cytosol archaebacteria, although the general structures of the
Many cytosolic proteins in animals (and probably attached glycans and sugar residues are very different
plants) contain one or more residues of β-linked from those found in animals and plants. Among the
GlcNAc in O-glycosidic linkage to Ser/Thr residues best studied prokaryotic glycoproteins are the cell
(11). These O-GlcNAcylated proteins (O-GlcNAc- surface or S-layer glycoproteins (12). Such S-layer
containing glycoproteins) are generated by the action glycoproteins can assemble into ordered lattice-like
of the UDPGlcNAc:polypeptide O-acetylglucosami- structures on the cell surface. Each S-layer glycoprotein
nyltransferase (O-GlcNAc transferase), which transfers may contain more than one attached glycan, which can
GlcNAc from UDPGlcNAc to selected Ser/Thr resi- be linked via Asn or other amino acid residues (Table
dues of cytosolic proteins. Some of the more pro- 1). In many cases the bacterial S-layer glycan chains are
minent O-GlcNAcylated glycoproteins include RNA linear or branched homo- or hetero-saccharides having
polymerase II, c-myc and the estrogen receptor. 20–50 identical repeating units. By contrast, archaeal
O-GlcNAcylation is one of the only types of S-layer glycoproteins have shorter glycans, generally
glycosylation that is reversible. The O-GlcNAc may lacking repeating units. Although the exact mechan-
be selectively removed by the action of an O-GlcNAc isms of S-layer glycoprotein biosynthesis are not yet
specific acetylglucosaminidase (O-GlcNAcase) in the defined, it appears that most sugar residue addition
cytosol. This alternating addition and removal of occurs in the outer membrane following protein
O-GlcNAc by these two enzymes is akin to reversible translocation.
phosphorylation and dephosphorylation of cytosolic
proteins. O-GlcNAcylation may serve to regulate many Many Factors Regulate Protein Glycosylation
metabolic pathways and is required for animal and As discussed above, two of the major factors regulating
plant cell growth. protein glycosylation are the sequence motifs within the
The storage polysaccharide glycogen, which is a primary structure of glycoproteins and the site of
glycoprotein in animals, is generated on the core biosynthesis. But many other factors also contribute to
protein glycogenin within the cytosol of animals, by its regulation of protein glycosylation. These include the
autocatalytic “self-glucosylation” of a Tyr residue at expression of glycosyltransferases, expression of gly-
position 194 using UDPGlc as a donor. The Glc-O-Tyr cosidases, secondary, tertiary and/or quaternary struc-
is then elongated by addition of other Glc residues tures of proteins, availability of donor substrates, e.g.
(up to 10) from UDPGlc by glycogenin activity. The sugar nucleotides and dolichol, cations, e.g. magnesium
Glc-containing oligosaccharide on glycogenin is then and manganese, temperature and membrane lipid
elongated by glycogen synthase. A similar type of composition and structure. Many of these factors,
activity may occur on the starch protein amylogenin. especially expression of glycosyltransferases, vary
tremendously between cell types. Dozens of different
Plant Glycoproteins glycosyltransferase genes encoding enzymes that act on
Many plant glycoproteins contain N-glycans, which are glycoproteins exist in the genomes of most multi-
also synthesized via the dolichol pathway in the ER. cellular organisms. Together, these many factors help to
They can also be subsequently modified by processing explain the huge differences in glycosylation observed
reactions and addition of other sugars to generate high between different cells and tissues.
mannose-, hybrid- and complex-type N-glycans. Many
plant wall proteins are typically glycoproteins rich in Glycoproteins Have Many Biological Functions
the amino acids hydroxyproline (▶hydroxyproline-rich Because glycoproteins are so common in all cells, it is
glycoprotein, HRGP), proline (proline-rich protein, not surprising that the glycan moieties have many
PRP), and glycine (glycine-rich protein, GRP). The O- different functions. Although many of the specific
glycans in HRGPs may account for up to 95% of the functions of glycoproteins are being defined, it is likely
glycoprotein weight and the glycans can range in size that the complete picture of glycoprotein functions will
from a single attached Ara residue to large ▶arabino- take many years to complete. Some of the known
galactans containing nearly 100 residues of Ara and functions of glycoproteins and their attached glycans
Gal. Many of these glycoproteins form rods (HRGP, include cell-cell adhesion, cell-matrix interactions,
PRP) or β-pleated sheets (GRP). Extensin is one of the glycoprotein targeting to organelles and cell signaling.
best-studied HRGPs. HRGP expression is increased by For example, glycoproteins regulate many different
Glycosylation of Proteins 717

types of cell adhesion, including sperm-egg adhesion, N-glycosylation can affect mobility upon isoelectric
leukocyte-platelet-endothelial cells adhesion, recog- focusing chromatography. CDG patients, depending on
nition and phagocytosis and neuronal cell-matrix the altered gene, exhibit a variety of changes in
adhesion. Some of the non-specific functions of physiognomy and suffer from neurological, liver and/
glycoprotein glycans include protein folding and or intestinal problems. Children with CDG, depending
assembly, protein protection and stability against on the type of genetic mutation, exhibit impairments in
proteases, control of the circulatory half-life of cognitive ability, speech and balance and motor skills.
glycoproteins, regulation of protein conformation and Other disorders where altered protein glycosylation is
thermal stability and control of enzyme kinetics. Many observed include several forms of congenital muscular
glycoprotein glycan functions are generated by glycan dystrophy, such as Fukuyama congenital muscular
recognition through carbohydrate-binding proteins or dystrophy, ▶limb-girdle muscular dystrophy, muscle-
lectins. Lectins are made by all organisms, including eye-brain disease and Walker-Warburg syndrome (15).
animal, plants, bacteria and viruses. Many of these diseases are associated with mutations in
genes encoding glycosyltransferases that add GlcNAc or
Human Disorders Associated with Defective Man residues to generate O-linked Man-containing
Protein Glycosylation glycans on α-dystroglycan, which is a membrane- G
There are many human disorders associated with associated glycoprotein that helps to link neuronal cells
an altered ability to add carbohydrate residues to and their cytosolic signaling machinery to extracellular
glycoproteins. One of the first defined examples of this matrix molecules such as laminin. Patients with proger-
was ▶I-cell disease, where patients were found to have a oid-type Ehlers-Danlos (E-D) syndrome have defects in a
recessive genetic mutation of the gene encoding the galactosyltransferase that is required to synthesize the
phosphotransferase activity that helps to generate Man- common linkage region of glycosaminoglycans. Another
6-P residues on lysosomal acid hydrolases. Conse- disorder where altered protein glycosylation is observed
quently, their cells are unable to synthesize lysosomal is leukocyte adhesion deficiency type II (LAD II), where
acid hydrolases with Man-6-P residues efficiently (13). patients lack the ability to add fucose to glycoproteins.
Most of the non-phosphorylated lysosomal acid hydro- This deficiency in fucosylation results in a lack of
lases from these patients become processed within the leukocyte adhesion to selectins, a group of carbohydrate-
Golgi apparatus, acquire sialic acid and other sugar binding proteins that recognize fucose-containing O-
residues on hybrid- and complex-type N-glycans and are glycans and serve to regulate leukocyte trafficking from
secreted into body fluids. The patients accumulate the bloodstream. Some LAD II patients have mutations in
undegraded macromolecules in lysosomes due to lack of the gene encoding the Golgi transporter for GDPFuc, thus
acid hydrolases and these accumulations are recognized preventing normal movement of GDPFuc from its site of
microscopically as inclusion or I-cells, hence the name synthesis in the cytosol into the Golgi apparatus for
I-cell disease. Another historically important defect in utilization by fucosyltransferases. Finally, defects in
glycoprotein glycosylation associated with human glycoprotein glycosylation are also seen in patients with
disease is PNH. Patients with PNH have reduced ability some autoimmune diseases, such as may occur in
to generate the GPI anchor, due to mutation in the congenital dyserythropoietic anemia type II, where a
X-linked PIG A gene. Hemolytic anemia results due to defect in N-glycosylation may occur due to deficiency of
deficiencies in the normally GPI-anchored glycopro- α-mannosidase II activity and in IgA nephropathy, where
teins termed decay accelerating factor (DAF or CD55) a subset of IgA molecules lack appropriate O-glycan
and membrane inhibitor of reactive lysis (MIRL or structures within the hinge region. These are just a few of
CD59), which function to decrease autolysis of the many examples where protein glycosylation is
erythrocytes by activated complement. essential to biological processes.
Many of the genetic defects in the ability to glycosylate ▶Biochemical Engineering of Glycoproteins
proteins are now recognized within the broad category ▶Protein Databases
of ▶congenital disorders in glycosylation (CDGs) (14). ▶Recombinant Protein Production in Mammalian Cell
The CDGs are highly varied depending on the glycan Culture
structures made and are recognized as different types,
such as Type 1a, 1b, 1c, 1d, 1e, and IIa. Each type of
CDG results from mutations in one of the many genes References
encoding proteins involved in N-glycosylation via the 1. Apweiler R, Hermjakob H, Sharon N (1999) On the
dolichol pathway or subsequent processing and frequency of protein glycosylation, as deduced from
analysis of the SWISS-PROT database. Biochim Bio-
glycosylation reactions, or in genes regulating orga- phys Acta 1473:4–8
nelle trafficking and biosynthesis. CDGs are often 2. Spiro RG (2002) Protein glycosylation: nature, distribu-
diagnosed by examining the N-glycosylation pattern of tion, enzymatic formation, and disease implications of
serum glycoproteins, such as transferrin, where altered glycopeptide bonds. Glycobiology 12:43R–56R
718 Glycosylphosphatidylinositol (GPI) Anchors

3. Laine RA (1994) A calculation of all possible oligosac- released in soluble form from the cell surface by the
charide isomers both branched and linear yields 1.05 × 10 action of specific phospholipases.
(12) structures for a reducing hexasaccharide: the Isomer ▶Epithelial Cells
Barrier to development of single-method saccharide
sequencing or synthesis systems. Glycobiology 4:759–767 ▶Glycosylation of Proteina
4. Helenius J, Aebi M (2002) Transmembrane movement of
dolichol linked carbohydrates during N-glycoprotein
biosynthesis in the endoplasmic reticulum. Semin Cell
Dev Biol 13:171–178
5. Parodi AJ (2000) Protein glucosylation and its role in Glycosyltransferase
protein folding. Annu Rev Biochem 69:69–93
6. Kornfeld S (1987) Chromatography: A review of clinical
applications. FASEB J 1:462–468
7. Schachter H (2000) The joys of HexNAc. The synthesis Definition
and function of N- and O-glycan branches. Glycoconj J Glycosyltransferase is a member of a large family of
17:465–483 enzymes expressed in the endoplasmic reticulum and
8. Baenziger JU, Green ED (1988) Pituitary glycoprotein
hormone oligosaccharides: structure, synthesis and Golgi apparatus, which catalyze the transfer of a
function of the asparagine-linked oligosaccharides on monosaccharide unit from a sugar-nucleotide donor,
lutropin, follitropin and thyrotropin. Biochim Biophys typically to the non-reducing terminus of an oligosac-
Acta 947:287–306 charide chain in glycoproteins and glycolipids.
9. McConville MJ, Menon AK (2000) Recent develop- ▶Glycosylation of Proteins
ments in the cell biology and biochemistry of glycosyl- ▶Limb Girdle Muscular Dystrophies
phosphatidylinositol lipids (review). Mol Membr Biol
▶Methylation of Proteins
17:1–16
10. Brockhausen I (1999) Pathways of O-glycan biosynth-
esis in cancer cells. Biochim Biophys Acta 1473:67–95
11. Wells L, Vosseller K, Hart GW (2001) Glycosylation of
nucleocytoplasmic proteins: signal transduction and O-
GlcNAc. Science 291:2376–2378 Glyoxylate
12. Schaffer C, Messner P (2004) Surface-layer glycopro-
teins: an example for the diversity of bacterial glycosyla-
tion with promising impacts on nanobiotechnology.
Glycobiology 14:31R–42R Definition
13. Raas-Rothschild A, Cormier-Daire V, Bao M et al Glyoxylate is a toxic compound, generated in vivo,
(2000) Molecular basis of variant pseudo-hurler poly- which needs to be eliminated by conversion into
dystrophy (mucolipidosis IIIC). J Clin Invest 105:673–681
14. Marquardt T, Denecke J (2003) Congenital disorders of
glycine via the peroxisomal enzyme alanine glyoxylate
glycosylation: review of their molecular bases, clinical aminotransferase.
presentations and specific therapies. Eur J Pediatr ▶Peroxisomal Disorders
162:359–379
15. Endo T, Toda T (2003) Glycosylation in congenital
muscular dystrophies. Biol Pharm Bull 26:1641–1647

Glypidation

Definition
Glypidation describes the attachment of a glycosyl-
Glycosylphosphatidylinositol (GPI) phosphatidylinositol-(GPI)-anchor to certain integral
membrane proteins. The anchor is composed of the
Anchors lipid phophatidylinositol to which a carbohydrate and
an ethanol and phosphate moiety is linked. The GPI-
anchor is attached post-translationally in the lumen of
Definition the endoplasmic reticulum, thereby replacing a tran-
In the lumen of the endoplasmic reticulum, the GPI sient transmembrane region of the modified protein.
anchor is covalently attached to the C terminus of The ▶GPI-anchor attaches proteins to the exoplasmic
proteins destined for the plasma membrane, and the leaflet of membranes, possibly to certain subdomains
transmembrane segment of the protein is cleaved off. such as caveolae and lipid-rafts.
As proteins are only attached to the exofacial leaflet of ▶Fatty Acid Acylation of Proteins
the plasma membrane by the GPI anchor, they can be ▶Glycosylation of Proteins
Gorlin’s Syndrome 719

in females. This causes delayed or no pubertal


Golgi Apparatus (Golgi Complex) development and infertility.
▶Hypothalamic and Pituitary Diseases Genetics

Definition
Golgi apparatus (Golgi complex) refers to a cytoplasmic
organelle in eukaryotes consisting of stacked, flattened
membrane cisternae, surrounded by vesicles, which is
involved in transport and post-translational modifica-
tion (especially glycosylation) of proteins on their
Gonadotropins
journey through the secretory pathway. The Golgi
complex is also a central sorting station in the secretory
pathway; on the trans- or exit-side of the Golgi complex, Definition
proteins get sorted into several distinct vesicle types for Gonadotropins are pituitary hormones that influence
transport to different final destinations. the functions of the ovary: ▶follicle stimulating
▶Biochemical Engineering of Glycoproteins hormone (FSH) and ▶luteinizing hormone (LH). G
▶Exocytotic Pathway ▶SRY – Sex Reversal
▶Glycosylation of Proteins ▶Hypothalamic and Pituitary Disease, Genetics
▶Limb Girdle Muscular Dystrophies
▶Rho, Rac, Cdc42
▶Vesicular Traffic

Gordon’s Syndrome

Gomori Trichrome Definition


Gordon’s syndrome is also known as type 2 Pseudo-
hypoaldosteronism (PHA2).
Definition ▶Mendelian Forms of Human Hypertension and
Gomori trichrome is a mixture of chemical compounds Mechanisms of Disease
that stain mitochondria red. ▶Type 2 Pseudohypoaldosteronism
▶Mitochondrial Myopathies

Gorlin’s Syndrome
Gonadal Mosaicism
Definition
▶Germline (Gonadal) Mosaicism Gorlin’s syndrome (also known as naevoid basal cell
carcinoma syndrome (NBCCS)) is a rare autosomal
dominant cancer disorder characterised primarily by a
predisposition to several tumours, most commonly
basal cell carcinoma (BCC). In addition to cancer
susceptibility, this syndrome is also associated with a
Gonadotropin Deficiency range of defects resulting from abnormal embryonic
development. Gorlin’s syndrome results from mutation
of the patched gene which functions in the hedgehog
Definition signalling pathway. The developmental defects are
Gonadotropin deficiency describes the absence, de- believed to result from ▶haploinsufficiency, with
creased production or dysfunction of anterior pituitary subsequent mutation of the remaining allele resulting
hormone (LH and/or FSH), which results in a in tumour formation.
decreased or lack of testosterone in males and estrogen ▶Hedgehog Signalling
720 GPCRs

GPCRs G-Protein Coupled Proteolytic Site

▶G-Protein Coupled Receptors A G-protein coupled proteolytic site is a peptide


sequence found in a number of G-protein coupled
receptors that acts as the target for specific proteolytic
cleavage, which releases the extracellular portion of the
receptor from the rest of the molecule.
G-Phase ▶Autosomal Dominant (Inherited Disorder)
▶G-Proteins and G-Protein Mutations in Human
Diseases
▶Polycystic Kidney Disease, Autosomal Dominant
▶Cell Cycle – Overview

GPI (-Anchored) Protein


G-Protein Coupled Receptors
Definition
A GPI-anchored protein is a protein that is anchored
in the membrane by glycosylated derivatives of Definition
phosphatidylinositol (GPI). The carboxyl group of the G-protein coupled receptors (GPCRs) comprise of the
C-terminal amino acid is connected through an amide largest family of cell surface receptors, which commu-
link to phosphoethanolamine, which is attached to a nicate their signal through G-proteins. A common
core tetrasaccharide composed of three mannose sugars structural feature of GPCRs is the presence of seven
and a single glucosamine sugar. The tetrasaccharide is hydrophobic transmembrane helices. Several hundred
in turn attached to phosphatidylinositol embedded in subtypes exist, which bind a huge variety of ligands,
the membrane. Proteins with this type of lipid anchor such as hormones and neurotransmitters. About 50% of
are only found on the extracellular face (outer leaflet) the currently used drugs are directed at GPCR’s.
of the membrane, and can be released in soluble form ▶Cardiac Signaling: Cellular, Molecular and Clinical
on the cell’s surface by the action of specific Aspects
phospholipases. ▶Cytokine Receptors
▶Biological Membranes ▶G-Proteins and G-Protein Mutations in Human
▶Epithelial Cells Diseases
▶Glycosylation of Proteins ▶Growth Factors
▶Methylation of Proteins
▶Photoreceptors
▶Seven-Transmembrane Receptors
▶Wnt/Beta-Catenin Signaling Pathway
GPIIb/IIIa Complex

Definition
The GPIIb/IIIa complex is a platelet membrane
glycoprotein complex mediating platelet aggregation
and adhesion to endothelial cells. The complex is an G-Proteins
integrin that recognises the arginine-glycine-aspartic
acid (rgd) sequence present on several adhesive
proteins. The GPIIb/IIIa complex functions as a Definition
receptor for fibrinogen, von Willebrand Factor ▶Diabetes Insipidus, a Water Homeostasis Disease
(vWGF), fibronectin, vitronectin, and thrombospondin. ▶G-Proteins and G-Protein Mutations in Human
Deficiency of GPIIb/IIIa causes ▶Glanzmann’s Diseases
Thrombasthenia. ▶Molecular Motors
G-Proteins and G-Protein Mutations in Human Diseases 721

inactive ground state has been resolved, by and large


G-Proteins and G-Protein Mutations confirming the overall architecture deduced from
in Human Diseases analogy-based biomodelling and various mutagenesis
approaches (4). As yet, it is only rudimentarily
understood how ligand-induced conformational change
T HOMAS G UDERMANN of a GPCR is translated into G-protein activation.
Phillips-University Marburg, Marburg, Germany
gudermann@staff.uni-marburg.de G-Protein Composition and Structural Aspects
G-proteins are heterotrimers composed of α, β, and γ
subunits. The α subunit is responsible for guanine
Synonyms nucleotide binding and GTP hydrolysis; the β and γ
Heterotrimeric guanine nucleotide-binding proteins subunits are associated in a tenaciously linked βγ
complex and can be regarded as one functional unit. To
Definition date 16 distinct genes for α, 5 for β and 12 for γ
subunits have been identified and characterized
A plethora of extracellular signaling molecules like
hormones, neurotransmitters, autacoids and growth
functionally (Table 1). Access to a deeper under- G
standing of G-protein structure and function at the
factors convey information between cells of a living
atomic level has been granted by solving crystal
organism. The majority of these extracellular signaling
structures of G-protein α subunits in the GDP- and
molecules transmit their signal by interacting with a
GTP-bound forms as well as in the transition state (5).
three-protein transmembrane signal transduction sys-
In addition, atomic structures of G-protein βγ dimers
tem composed of receptors, G-proteins and cellular
and of αβγ heterotrimers are also available. Gα proteins
effectors. All mammalian cells are endowed with a
are principally composed of a ▶Ras-like GTPase and
complement of G-protein-coupled receptors and sev-
an α-helical domain, both forming a deep cleft
eral types of heterotrimeric G-proteins. Over the last
harboring the guanine nucleotide. Gβ subunits fold
few years systematic biochemical, cell biological and
into a highly symmetrical β propeller with an
structural studies have laid a solid foundation for our
approximate 7-fold symmetry. Gγ binds to Gβ in an
understanding of G-protein dependent signal transduc-
extended conformation devoid of intrachain tertiary
tion processes. The characterization of genetically
interactions. Those segments in the Gα protein that
engineered mice carrying mutations in different G-
undergo structural changes upon GTP hydrolysis are
protein genes as well as the clinical phenotype of
named switch I, II, and III regions. The most obvious
patients affected by mutated G-proteins have greatly
changes occur in the switch II region. A mechanistic
furthered our understanding of the biological functions
explanation for subunit dissociation and reassociation
of heterotrimeric G-proteins as central processors of
contingent upon the guanine nucleotide bound to the α
information.
subunit can be derived from the observation that the
most extensive contact area between Gα and βγ
Characteristics comprises the protein surface around the switch II
Basic Structures, Mechanisms and Classifications region of Gα. So far, structural information does not
The vast majority of extracellular signals interact with provide an obvious cue as to the specificity in the
transmembrane receptors which couple to heterotri- pairing of a particular α subunit with a defined βγ
meric guanine nucleotide-binding proteins (G-proteins) dimer.
acting as transducers and signal amplifiers (1). Because effector contact sites of Gα have also been
Activated G-proteins then modulate the activity of mapped to areas around the switch II region, Gβγ- and
cellular effectors. A comprehensive analysis of the effector-interacting surfaces of Gα overlap signifi-
human and mouse genomes defined a repertoire of cantly. Thus, Gα cannot interact with cellular effectors
G-protein-coupled receptors (GPCRs) for endogenous unless it dissociates from Gβγ. Conversely, activation
ligands comprising close to 400 genes (2, 3). GPCRs of Gβγ is a consequence of its release from Gα, which
form a large and functionally diverse superfamily, functions as a negative regulator of free βγ subunits
participate in a variety of physiological processes and within the cell. The regions of Gβγ that interact with
are prime targets for pharmaceutical drugs. They are downstream effectors map to an N-terminal Gβ
integral membrane proteins, characterized by 7 α- fragment of approximately 100 amino acids. As yet,
helical transmembrane domains arranged in an anti- it is not understood how an activated receptor catalyses
clockwise bundle (as viewed from the extracellular the dissociation of GDP from a G-protein heterotrimer.
side) and connected by alternating extracellular and The most clearly defined Gα contact sites with
intracellular loops of variable lengths. The crystal heptahelical receptors are located in the C-terminal
structure of the prototypical GPCR ▶rhodopsin in the region of the α subunit. However, compelling evidence
722 G-Proteins and G-Protein Mutations in Human Diseases

G-Proteins and G-Protein Mutations in Human Diseases. Table 1 G-protein α subunits

Name Expression Examples of effectors Bacterial Toxins


αs subfamily
αs ubiquitous AC ↑, VGCC, Src, RGS-PX1 CTX
(GAP, sorting nexin)
αolf olfactory epithelium, brain CTX
αi subfamily
αt-r retinal rods cGMP PDE ↑ CTX, PTX
αt-c retinal cones cGMP PDE ↑ CTX, PTX
αgust taste cells PDE ?, PLC-β ? CTX, PTX
αi1 mainly neuronal cells AC ↓, Src, RapIGAPs PTX
αi2 ubiquitous PTX
αi3 widely expressed PTX
αo neuronal, neuroendocrine cells PTX
αz neuronal cells,thrombocytes
αq subfamily
αq ubiquitous PLC-β ↑, LARG-RhoGEF
α11 widely expressed
α14 kidney, lung, spleen
α15/16 hematopoietic cells
α12 subfamily
α12 ubiquitous RhoGEFs, E-cadherin
α13 ubiquitous

AC, adenylyl cyclase; CTX, cholera toxin; GAP, GTPase-activating protein; GEF, guanine nucleotide exchange factor; PDE,
phosphodiesterase; PLC, phospholipase C; PTX, pertussis toxin; RGS, regulator of G-protein signaling; VGCC, voltage-gated
calcium channel

has also been presented for participation of the Gα N- G-Protein Cycle


terminus as well as the βγ dimer in receptor interaction. Binding of an agonist to a heptahelical receptor entails
Considering the proximity of the Gα N-terminus and the formation of a ternary complex consisting of
Gγ C-terminus on a common face of the G-protein agonist, receptor and heterotrimeric G-protein in its
heterotrimer anchored to the plasma membrane via GDP-liganded form. The binding of βγ subunits to Gα
lipid modifications, cytoplasmic receptor domains have stabilizes the flexible switch regions and hence the
access to large surface areas of α, β and γ subunits. GDP-dependent inactive Gα conformation. The acti-
However, according to the crystal structures available, vated receptor fulfills the role of a catalytically acting
the distance between the posttranslationally modified ▶guanine nucleotide exchange factor that facilitates
Gα N-terminus and Gγ C-terminus (estimated mini- the release of GDP (Fig. 1). The short-lived guanine
mum of 40 Å) appears to be too large to interact with nucleotide-free G-protein heterotrimer stabilizes the
rhodopsin’s cytoplasmic surface area (maximal dis- receptor in its high-affinity conformation. Due to its
tance between interacting loop sites approximately high intracellular concentration, GTP is rapidly in-
35 Å) at the same time, thus implying a two-step corporated into the guanine nucleotide-binding pocket
sequential interaction mechanism. The structural basis in Gα, resulting in a conformational change in the α
underlying the selectivity of receptor-G-protein inter- subunit and a dissociation of GTP-bound Gα and Gβγ.
action still remains only partially defined. Both reaction products are intracellular signaling
G-Proteins and G-Protein Mutations in Human Diseases 723

GTP hydrolysis and rendering the α subunit constitu-


tively active. ▶Pertussis toxin (PTX) adds the ADP-
ribosyl moiety to a cysteine residue in the C-terminus
of most Gi proteins, thereby effectively uncoupling
modified G-proteins from the receptor. Both toxins
have been extensively used to dissect G-protein-
mediated signaling pathways.
In a physiological setting, the GTPase activity of α
subunits is controlled by a diverse family of multi-
functional signaling proteins, ▶regulators of G-protein
signaling (RGS), which bind directly to activated Gα
subunits in order to accelerate the rate of GTP
hydrolysis by several orders of magnitude. In addition,
G-protein effectors such as phospholipase C-β can also
potentiate GTPase activity of α subunits (Fig. 1). Apart
from deactivation of G-protein α subunits and termina- G
tion of downstream signals, RGS proteins may also be
viewed as bona fide effectors setting in motion various
G-Proteins and G-Protein Mutations in Human Dis- intracellular signaling cascades by means of protein-
eases. Figure 1 The G-protein cycle. The activated protein interactions mediated by a host of defined
receptor functions as a guanine nucleotide exchange structural signaling motifs. Furthermore, they may
factor to release bound GDP. G-protein βγ subunits profoundly shape the dynamics and determine the
stabilize the inactive, GDP-bound Gα conformation. efficacy of G-protein cycling. An approximately 120-
Pertussis toxin (PTX) modifies the C-terminus of some amino acid region that directly interacts with GTP-
G-protein α subunits and uncouples these G-proteins bound α subunits, called the RGS domain, is the
from the receptor. Both GTP-bound α subunits and βγ
structural hallmark of more than 30 RGS family
dimers are signaling proteins in their own right and
interact with effector proteins. Gα activation is termi-
members. The majority of RGS proteins appear
nated by hydrolysis of GTP. The endogenous GTPase to target Gi and Gq family members, whereas some
activity is accelerated by effectors such as phospholi- (p115-RhoGEF, PDZ-RhoGEF, LARG-RhoGEF) spe-
pase C-β or RGS (for “regulator of G-protein signaling”) cifically interact either with Gα12/13 proteins or with
proteins. Cholera toxin (CTX) modifies a highly con- Gαs (RGS-PX1). The exact physiological functions of
served arginine residue in some α subunits, thereby RGS proteins are still poorly understood.
abolishing GTPase activity and rendering the G-protein Within the past few years alternative modes of signal
α subunit constitutively active. For further details see input into G-protein cascades have been discovered
text. which are independent of heptahelical receptors. Four
proteins, called AGS1-4 (for ▶activators of G-protein
signaling), have been identified that engage G-protein-
dependent signaling pathways in the absence of a
proteins in their own right, activating distinct and classical receptor. AGS1 is a distinct member of the
overlapping portfolios of cellular effectors. In contrast large superfamily of Ras-related proteins and targets α
to the situation with monomeric GTPases of the Ras subunits. AGS2 represents a Gβγ-binding component
family, G-protein α subunits are endowed with an of the cytoplasmic motor protein dynein and has
endogenous GTPase activity. A highly conserved undefined roles in cellular signaling. The third protein,
arginine residue in the helical domain, Arg201 in Gαs, AGS3, possesses seven tetratrico peptide (TRP) repeats
directly participates in GTP hydrolysis by stabilizing and four amino acid repeats termed GoLoco motifs
the negative charge on γ-phosphoryl oxygen atoms in (meaning Gαi/o-Loco interaction motif) also described
the transition state. Ras proteins lack such a residue and as G-protein regulatory (GPR) motifs. The GoLoco
are essentially inactive as GTPases. The conformation motif, which can also be found in other signaling
of GDP-bound Gα allows for the reassociation with the proteins like RGS12 and 14 functions as a selective Gα
βγ dimer and the inactive GαGDPβγ complex is now binding partner. The GoLoco/Gα interaction releases
prone to another round of activation and deactivation. βγ subunits and at the same time inhibits guanine
Two bacterial toxins interfere with the GTPase cycle by nucleotide exchange in the bound α subunit, thus
covalently modifying G-protein α subunits. ▶Cholera “freezing” monomeric Gα in the inactive, GDP-bound
toxin (CTX) ADP-ribosylates the aforementioned state. Recently, an additional AGS protein, AGS4, was
conserved arginine residue in the GTPase domain of identified which contains three GPR motifs and
some G-protein α subunits (Table 1), thus blocking regulates the activation state of Gαi. At present, the
724 G-Proteins and G-Protein Mutations in Human Diseases

cell physiological role of AGS proteins is still fairly and inactivating mutations in GPCRs and G-proteins
obscure. Possibly, these proteins are involved in the are responsible for an increasing number of human
regulation of basic cellular processes like the main- diseases. Functional variability resulting from poly-
tenance of cell polarity and cell division. By acting in morphisms may underlie interindividual differences in
concert with GPCRs they may provide for a signal response to endogenous ligands as well as drugs. At
amplification mechanism and at the same time allow present, the Gαs gene (GNAS1) is the only G-protein
signal transmission via G-proteins independent of gene that has been unequivocally shown to be afflicted
heptahelical membrane receptors. with activating or inactivating mutations that cause
human diseases (8-10).
G-Protein Families and Effectors
Based on the primary amino acid sequence of their α Activating Mutations in Gai
subunits, G-proteins are subdivided into four distinct Mutations in the Gαi2 gene were diagnosed in fixed
families: Gs, Gi, Gq, and G12 (Table 1) (6, 7). Concentra- sections of human ovarian sex chord stromal tumors
tions of Gi proteins in the cell considerably exceed those and adrenal cortical tumors. In a few affected speci-
of other families, and in brain Go may amount to 1–2% mens, the highly conserved Arg179 corresponding to
of total membrane protein. Some G-protein α subunits the aforementioned Arg201 in Gαs in the helical domain
are characterized by a very restricted expression pattern was found to be exchanged for a histidine (Arg179H)
(Table 1), while others like Gs, Gi2, Gq, G12, and G13 giving rise to a Gαi2 protein devoid of any GTPase-
are ubiquitously expressed. G-proteins can also be activity. Constitutively active Gαi2 was subsequently
classified on the basis of the cellular effectors to which referred to as the gip2 oncogene. Most notably, this
they couple. Gs proteins classically stimulate ade- finding could not be confirmed by subsequent studies
nylyl cyclase activity, while Gi proteins inhibit adenylyl on fresh surgically resected tumors, and transgenic
cyclase via their α subunits, but activate inwardly animals expressing the gip2 oncogene in selected
rectifying potassium channels and inhibit P/Q-, N- and tissues have not been reported yet. Therefore, one has
R-type voltage-gated calcium channels via βγ subunits to conclude at this point that the oncogenic potential as
released upon GTP binding to Gαi proteins. Gq proteins well as the frequency of activating mutations of Gαi2
activate phospholipase C-β isoforms, and G12 proteins appear to be rather low.
couple to Rho guanine nucleotide exchange factors
resulting in Rho activation and stress fiber formation Activating Mutations in Gas
(Table 1). Both GTP-loaded α subunits and βγ dimers In many endocrine glands, cAMP stimulates prolifera-
are eo ipso signaling proteins exerting their action tion, differentiation and hormone secretion. A first hint
through activation or inhibition of an ever expanding list to the possible causative contribution of activating
of cellular effector proteins (for Gα effectors see: Table mutations in GNAS1 arose from the identification of a
1). Effectors for G-protein βγ subunits include inwardly subset of growth hormone (GH)-secreting pituitary
rectifying potassium channels (Kir3.1–3.4), G-protein- tumors characterized by high intracellular cAMP
coupled receptor kinases (GRKs), adenylyl cyclases concentrations and increased adenylyl cyclase activity.
(adenylyl cyclases II and IV), phospholipases C (PLC)- These adenomas accounting for approximately 40% of
β1, -β2 and -β3 and phosphatidylinositol 3-kinases GH-secreting tumors were shown to harbor hetero-
(PI3K) β and γ. zygous missense mutations in GNAS1 exons 8 or 9
Considering that hundreds of GPCRs transduce signals giving rise to Arg201Cys/His and Gln227Arg/Lys
by interacting with a limited number of G-proteins, the missense mutations, respectively. Both the highly
question of coupling specificity is worth considering. conserved Arg and Gln residues are essential for GTP
The concept of linear G-protein-mediated signal hydrolysis to occur in the α subunit with the
transduction pathways, i.e. one receptor coupling to requirement of Gln227 to orient and polarize the
one distinct G-protein activating one receptor, appears catalytic water in the transition state. Therefore, these
to be inadequate to describe physiology. G-protein- missense mutations ablate the endogenous GTPase
mediated signal transduction is a complex signaling activity of Gαs and render the α subunit constitutively
network with diverging and converging transduction active, leading to uncontrolled, excessive cAMP
steps at each coupling interface. Deciphering the production in somatotrophs. As cAMP stimulates
mechanism of signal specificity in living cells still proliferation and differentiation in these cells, the
remains a scientific challenge of paramount importance. GTPase-deficient Gαs mutants have been designated
gsp oncogenes.
G-Protein Mutations as the Molecular Basis Gsp mutations are also rarely observed in other
of Human Diseases pituitary tumors like corticotrophs, resulting in in-
Because of their central role in controlling many creased ACTH release (Table 2). Besides, approxi-
physiological functions, naturally occurring activating mately 10% of non-functional pituitary adenomas carry
G-Proteins and G-Protein Mutations in Human Diseases 725

G-Proteins and G-Protein Mutations in Human Diseases. Table 2 Diseases associated with GNAS1 mutations

Type of mutation Disease Mode of inheritance


Gain of function GH-secreting pituitary adenomas, thyroid somatic mutations
adenomas and carcinomas, Leydig cell
adenomas, pheochromocytoma, parathyroid
adenoma, McCune-Albright syndrome, osseous
fibrous dysplasia
Loss of function Albright hereditary osteodystrophy germline mutations,
PHP Ia maternal transmission
PPHP paternal transmission
progressive osseous heteroplasia (POH) paternal transmission?
382
PHP Ib (Gαs-ΔI ) maternal transmission
Loss and gain of function PHP Ia and testotoxicosis (Gαs-A 366
S) maternal transmission G
GNAS1 imprinting defect PHP Ib maternal transmission

GH, growth hormone; PHP, pseudohypoparathyroidism; PPHP, pseudopseudohypoparathyroidism

gsp mutations. Several studies have confirmed the developmental, skeletal and endocrine abnormalities
presence of activating mutations in Gαs in up to 30% of was described to harbor a germline Arg201Leu mutation
toxic thyroid adenomas and in less than 10% of thyroid in GNAS1.
carcinomas. Sporadically, gsp mutations are found in Gαs mutations have also been found in all cases of
parathyroid and adrenocortical tumors as well as in fibrous dysplasia (FD) of the bone. The majority of FD
▶pheochromocytomas. patients has only a single bone defect, a small group
The ▶McCune-Albright syndrome (MAS) is classi- suffers from multiple bone lesions or has other features
cally defined by the clinical triad of cafe´-au-lait of MAS. Missense mutations in GNAS1 identical to
hyperpigmented skin lesions, precocious puberty and those in MAS patients, i.e. Arg201His/Cys, were
polyostotic fibrous dysplasia of the bone. Apart from diagnosed in nearly all forms of FD. A possible
the gonads, other endocrine glands such as the pituitary, explanation for the clinical phenotype relates to the
adrenal cortex and thyroid that are sensitive to trophic general concept that elevated intracellular cAMP levels
cAMP-dependent stimuli were also found to be in osteogenic precursors entail increased proliferation
hyperfunctional in MAS. Nodular and diffuse goiters and decreased differentiation of these cells resulting in
as well as benign thyroid nodules are associated with benign fibrous bone lesions. Activating Gαs mutations
MAS. The sporadic occurrence of thyroid cancer have also been reported to occur in isolated intramus-
(papillary and clear cell thyroid carcinoma) in MAS cular myxomas and those that present in conjunction
patients suggests that additional mutational or epige- with FD (Mazabraud syndrome)
netic events in addition to gain-of-function Gαs
mutations are mandatory for thyroid carcinogenesis in Loss-of-Function Mutations in Gas
these patients. In 1986 the dermatologist Happle More than 60 years ago, Fuller Albright and his
suggested MAS to be caused by a dominant somatic colleagues described several patients presenting with
mutation as an early postzygotic event resulting the short stature, obesity, skeletal abnormalities, mental
mosaic pattern of clinical stigmata. Mutations in retardation and often subcutaneous ossification. This
GNAS1 have been confirmed in affected endocrine syndrome, now collectively called ▶Albright’s heredi-
tissues and in hyperpigmented skin lesions of all MAS tary osteodystrophy (AHO) frequently concurs with
patients. Interestingly, missense mutations were de- resistance to parathyroid hormone (PTH) and other
tected at only one position, i.e. Arg201His/Cys. The hormones like GH-releasing hormone, thyrotropin and
overall clinical picture of an individual MAS patient is gonadotropins acting via Gs-coupled receptors. AHO in
determined by the distribution of cells bearing the conjunction with this kind of hormone resistance gives
somatic gsp mutation. It is tempting to speculate that rise to the complex syndrome of ▶pseudohypopar-
germline gsp mutations are incompatible with life. In athyroidism (PHP) type Ia (Table 2). Patients with
contrast to the latter concept, one patient with severe pseudopseudohypoparathyroidism (PPHP) show the
726 G-Proteins and G-Protein Mutations in Human Diseases

typical features of AHO, yet do not suffer from any long and two short forms of Gαs. There is little
kind of hormone resistance. On the contrary, PHP Ib evidence to suggest that these splice variants have
patients present with symptoms of isolated PTH distinct signaling properties. During the past few years
resistance, but lack the typical AHO phenotype. A it has become obvious that GNAS1 not only codes for
mild form of thyrotropin resistance has recently been Gαs, but also for several other transcripts by using
observed in PHP Ib patients raising the possibility that 4 alternative promoters and first exons which splice
other endocrine systems may also be affected. onto a common exon 2 (Fig. 2). The most upstream
Subsequent systematic studies were able to allocate alternative promoter gives rise to transcripts coding for
the molecular defect in these three main forms of PHP the chromogranin-like neuroendocrine secretory pro-
to GNAS1. tein 55 (NESP55) whose entire coding sequence
Heterozygous inactivating mutations affecting one of resides in the upstream exon, thus leaving Gαs exons
the Gαs-specific exons are the molecular cause of PHP 2–13 within the 3′ untranslated region of the NESP55
Ia, PPHP and of progressive osseous heteroplasia transcript. The NESP55 promoter is methylated on the
(POH). POH patients suffer from severe heterotopic paternal allele, so that the NESP55 gene is exclusively
ossification involving skeletal muscle and deep con- transcribed maternally. The XLαs transcript encodes a
nective tissue. They frequently lack hormone resistance protein with an extended N-terminus when compared
and typical AHO features. Many of the GNAS1 to Gαs and is transcribed from the paternal allele only
mutations are deletions or insertions that give rise to (Fig. 2). The C-terminal 348 amino acid residues are
frameshifts and premature stop codons, nonsense identical to Gαs. XLαs is highly expressed in the
mutations or splice junction mutations. In addition, a pituitary, is targeted to the plasma membrane, interacts
number of missense mutations adversely affect protein with βγ subunits and can be activated by non-
stability. The latter scenario is exemplified by a hydrolysable GTP analogs. However, there is no
missense mutation, A366S, in the critical guanine evidence that XLαs is regulated by GPCRs. An
nucleotide binding motif of the GTPase domain, additional transcript derived from the sense strand of
leading to an accelerated release of GDP from the α the paternal allele uses exon 1A (exon A/B) as the first
subunit and marked instability of the guanine nucleo- exon and also splices onto exons 2–13 (Fig. 2).
tide-free protein at the body core temperature of 37 °C. However, exon 1A generates transcripts that are
At lower ambient temperatures, for instance in the testis, presumably untranslated. Upstream of the XLαs exon,
protein stability is not impaired and the accelerated a promoter for antisense transcripts traversing the
nucleotide exchange manifests as constitutive Gαs NESP55 exon has been identified. These NESP55
activity. Therefore, the clinical phenotype of PHP Ia antisense transcripts are only expressed from the
and excessive testicular testosterone production (testo- paternal allele and may contribute to the imprinting
toxicosis) arise from the intriguing A366S mutation of NESP55 by silencing the NESP55 promoter on the
(Table 2). In most tissues, an approximately 50% paternal allele.
reduction of functional Gαs activity significantly Around 100 autosomal genes are subject to ▶genomic
reduces cAMP formation in the case of inactivating imprinting. One genomic region controlled by this
Gαs mutations. However, the cAMP levels that can still epigenetic phenomenon is located in the distal portion
be generated are sufficient to maintain physiological of chromosome 2 and encompasses GNAS1. All
functions. Thus, there is no evidence for haploinsuffi- imprinted genes have one or more regions in which
ciency to explain the hormone resistance observed in the cytosines within CpG dinucleotide stretches are
patients. methylated on one parental allele only. Very often these
Retrospective analyses of PHP patients revealed that methylated regions coincide with gene promoters. As
the clinical phenotype was strongly influenced by the described above and illustrated in Fig. 2, the promoter
parent transmitting the mutated allele. Any inactivating regions of GNAS1 display a complex imprinting
Gαs mutation leads to AHO irrespective of the parent pattern. To complicate the scenario even further, the
transmitting the defective gene. Hormone resistance promoter region giving rise to Gαs transcripts does
characteristic of PHP Ia occurs only if the genetic not exhibit allele-selective methylation and in most
defect is inherited from a mother suffering from either tissues expression occurs bialellically. This situation
PHP Ia or PPHP. Conversely, there is mounting notwithstanding, paternal Gαs expression is silenced in
evidence that POH is inherited from the father. This a few tissues by a mechanism that is presently
conspicuous parent-of-origin-specific inheritance pat- unknown. In proximal renal tubular cells, adipocytes,
tern suggests that GNAS1 is imprinted. pituitary gland, thyroid and gonads, Gαs expression is
The human gene for Gαs is a single-copy gene located largely driven by the maternal allele. In PHP Ia
at 20q13.2-13.3. Gαs is encoded by exons 1–13 patients, renal proximal tubule cells are resistant
(Fig. 2). Alternative splicing of exon 3 produces two to PTH action, because Gαs expression is restricted
G-Proteins and G-Protein Mutations in Human Diseases 727

G-Proteins and G-Protein Mutations in Human Diseases. Figure 2 Organization and imprinting of the
GNAS1 locus.GNAS1 is characterized by 4 alternative first exons which splice onto exon 2. Methylation patterns
(methyl) and transcriptional activation (arrows) of the maternal and paternal allele are indicated. The hatched
arrow for exon 1 of the paternal allele indicates that it does not contribute to Gαs expression in all cells.
An antisense mRNA is transcribed across the NESP55 exon on the paternal allele.

to the maternal allele that carries an inactivating tissue-specific repressor protein that hampers Gαs
mutation. Yet PHP Ia patients are not prone to expression. Alternatively, the deletion may disrupt a
hypercalciuria, suggesting that the anticalciuric PTH cis-acting imprinting control element necessary for the
action in the thick ascending limb is fully operative methylation imprint at exon 1A (exon A/B). As the
because of biallelic Gαs expression in this part of the described deletions disrupt another gene, STX16
nephron. Thus, tissue- and cell-specific imprinting coding for Syntaxin 16, one may speculate that the
represents the molecular mechanism underlying the STX16 region comprises such an imprinting control
clinical features of PHP Ia, while haploinsufficiency element. The relevance of these epigenetic changes, i.e.
alone may lead to AHO. the loss of maternal-specific methylation of GNAS1,
A first glance at the mechanism of tissue-specific Gαs for the clinical PHP Ib phenotype is emphasized by a
imprinting was granted by studies on patients with PHP patient with paternal uniparental disomy of chromo-
Ib. The vast majority of these patients who present with some 20q. In this situation both long arms of
renal PTH resistance, sometimes accompanied by chromosome 20q are of paternal origin resulting in
partial TSH resistance, exhibit a loss of methylation PTH resistance, but not in AHO.
at the GNAS1 exon 1A, while lacking mutations in the A unique heterozygous 3 bp deletion causing loss of
exons coding for Gαs. This loss of the maternal allele- Ile382 in the C-terminus of Gαs was detected in 3
specific methylation pattern, linked to an upstream 3 kb affected boys with PHP Ib. When heterologously
deletion, makes the maternal allele look like the expressed, the mutant Gαs was found to be unable to
paternal one, resulting in silencing of maternal Gαs couple to the PTH receptor, while interaction with the
expression in renal proximal tubules. One possible Gs-coupled thyrotropin and luteinizing hormone re-
explanation is based on the hypothesis that the non- ceptors was unaffected. These results explain PTH-
methylated exon 1A region allows for the binding of a specific hormone resistance in the affected patients.
728 GPx

The absence of any phenotype in the mother and 3. Pierce KL, Premont RT, Lefkowitz RJ (2002) Seven-
maternal grandfather carrying the same mutation is transmembrane receptors. Nat Rev Mol Cell Biol 3:639–
commensurate with our current understanding of 650
4. Palczewski K, Kumasaka T, Hori T et al (2000) Crystal
paternal imprinting of the GNAS1 gene. structure of rhodopsin: A G-protein-coupled receptor.
Science 289:739–745
The Gb3-C825T Polymorphism in Multigenic Disorders 5. Wall MA, Posner BA, Sprang SR (1998) Structural basis
A single base substitution (C825T) in the Gβ3 subunit of activity and subunit recognition in G-protein hetero-
leading to a truncated protein has originally been trimers. Structure 6:1169–1183
reported in association with primary hypertension. 6. Offermanns S (2003) G-proteins as transducers in
transmembrane signalling. Progr Biophys Mol Biol
More recently, genetic associations with a number of
83:101–130
other disorders such as obesity and insulin resistance 7. Cabrera-Vera TM, Vanhauwe J, Thomas TO et al (2003)
have been suggested. So far, the underlying mechanism Insights into G-protein structure, function, and regula-
by which the Gβ3 variant causes the different tion. Endocr Rev 24:765–781
phenotypes remains elusive. 8. Spiegel AM, Weinstein LS (2004) Inherited diseases
involving G-proteins and G-protein-coupled receptors.
Annu Rev Med 55:27–39
9. Weinstein LS, Liu J, Sakamoto A et al (2004) GNAS:
Conclusions
normal and abnormal functions. Endocrinology
In the past few years significant progress has been 145:5459–5464
made towards a truly molecular understanding of 10. Bastepe M, Juppner H (2005) GNAS locus and
receptor/G-protein-mediated signal transduction. One pseudohypoparathyroidism. Horm Res 63:65–74
crystal structure of a heptahelical receptor, rhodopsin,
and several of G-proteins provide a solid foundation
for future work on the mechanisms of receptor and
G-protein activation. An important goal will be to
determine the structural differences between the
inactive and active receptor conformations as well as
GPx
the structure of receptors in complex with heterotri-
meric G-proteins. Studies on engineered gene-deficient
mice as well as the thorough in vivo and in vitro ▶Glutathione Peroxidase
characterization of naturally occurring G-protein muta-
tions detected in patients have taught us invaluable
lessons on the physiology of these cardinal signaling
proteins. Studying clinical and molecular aspects of the
different forms of PHP has highlighted the complex G-Quartet DNA
regulation of Gαs expression and provided remarkable
insights into the basic mechanisms of genomic
imprinting. Our understanding of receptor and G- Definition
protein-mediated signaling processes has shifted G-quartet DNA (also know as G4 DNA) defines a four-
from studying linear signaling cascades towards the stranded DNA structure formed by nucleic acid rich
consideration of complex signaling networks which guanine/cytosine regions. This structure is highly
will require novel collaborative research initiatives to stabilized by a planar array of four hydrogen-bonded
integrate bits and pieces of knowledge into a coherent guanine bases.
instructive model. ▶DNA Helicases
▶Cardiac Signaling: Cellular, Molecular and Clinical
Aspects

References Graafian Follicles


1. Gudermann T, Kalkbrenner F, Schultz G (1996)
Diversity and selectivity of receptor-G-protein interac-
tion. Annu Rev Pharmacol Toxicol 36:429–459
2. Schöneberg T, Schulz A, Gudermann T (2002) The
Definition
structural basis of G-protein-coupled receptor function Graafian follicles designate cellular components of the
and dysfunction in human diseases. Rev Physiol ovary, each consisting of a germ cell (oocyte),
Biochem Pharmacol 144:143–227 surrounded by somatic cells (follicle cells), and a large
Green Fluorescent Protein 729

fluid-filled cavity from which the unfertilized egg


emerges. Granulomatosis
▶Mammalian Fertilization

Definition
Granulomatosis refers to a multisystem disease that is
characterized by an inflammation of the blood vessels
(▶vasculitis) involving the upper and lower respiratory
Grade of Malignancy tracts and variable degrees of systemic, small vessel
vasculitis, which is generally considered to represent a
hypersensitivity reaction to an unknown antigen.
Definition ▶Recombinant Protein Expression in Bacteria
Grade of malignancy designates the histomorphologi-
cal assessment of the malignant behavior of a tumor, as
estimated by cytological criteria such as nuclear
pleomorphism and number of mitoses, and histological
G
criteria such as the formation of differentiated struc- GRAS
tures. Usually, three grades (G1, well differentiated;
G2, moderately differentiated; and G3, poorly differ-
entiated; with increasing aggressiveness in this order) Definition
are distinguished. Generally Recognised As Safe: the US Congress
▶Breast Cancer established this concept and regulatory policy in 1958
as part of its food safety legislation. Judged by qualified
experts, it means that ingredients or hosts are safe when
used in food or food production to accomplish their
technical or nutritional purposes.
Granulation Tissue ▶Recombinant Protein Expression in Yeast

Definition
Granulation tissue defines a new connective tissue that
is formed during the wound repair process and
Grb2
temporarily replaces the lost dermal part of the skin.
The name derives from the granular appearance of
numerous new capillaries. Definition
▶Wound Healing Grb2 stands for Growth-factor-receptor-bound protein
2. It is an adaptor protein containing src homology
domains, one of which binds to and translocates the
guanine nucleotide exchange factors ▶SOS. It is
involved in activation of Ras, but can also play a role
in other signaling pathways in mammalian cells.
Granuloma ▶Ras Signalling Pathway
▶Signal Transduction: Integrin-Mediated Pathways
▶Tyrosine Kinase
Definition
Granuloma represents a chronic inflammatory lesion
initiated by various infectious and non-infectious
agents. Granuloma consists of either small, nodular
aggregations of mononuclear inflammatory cells or of Green Fluorescent Protein
aggregations of different cells, usually modified
macrophages surrounded by lymphocytes and multi-
nucleated giant cells. Sometimes granuloma may also Definition
contain eosinophils and B cells, and are surrounded by GFP stands for Green Fluorescent Protein. It is a natural,
fibrotic tissue. 27 kDa fluorescent protein, originally produced by the
▶Morbus Wegener marine jellyfish Aequorea victoria, and fluoresces or
730 Greig’s Cephalopolysyndactyly

glows green visible light when excited by UV light (395


nm). GFP is commonly used in the laboratory for Growth Factors
labeling, detecting and tracking proteins and biological
processes. It can be cloned without co-factors in most
organisms, and is used as a reporter molecule in light U LF H EDIN 1 , J OY R OY 2
1
microscopic imaging in co-expression assays to visua- Department of Surgical Sciences,
lize cellular structures and molecules. The protein has Karolinska Hospital, and 2Department of Surgery,
been mutated to generate blue, cyan, yellow, photo- St Gorans Hospital, Stockholm, Sweden
activatable, and monomeric variants. ulf.hedin@kirurgi.ki.se
▶C. Elegans as a Model Organism for Functional
Genomics
▶Electron Tomography Definition
▶FCS Polypeptide growth factors are proteins with a funda-
▶FRAP mental role during embryogenesis and regeneration of
▶Functional Assays tissues. In contrast to some hormones, which regulate
▶High-Throughput Approaches to the Analysis of growth of entire organisms, growth factors are essential
Gene Function in Mammalian Cells for replication of individual cells and to the main-
▶Immunochemical Methods, Localization tenance of normal cell function. Some growth factors
▶Large-Scale Homologous Recombination Approaches stimulate cell division in numerous different cell types,
in Mice while others are specific for particular cells. Growth
▶Medaka as a Model Organism for Functional factors mediate their effects on cells by binding to
Genomics specific surface receptors. Binding of growth factors to
▶Transgenic and Knockout Animals their corresponding receptors activates signaling sys-
tems inside cells, which regulate transcription of genes
involved in cellular processes such as differentiation,
proliferation, migration, protein synthesis and metabo-
lism. In adult organisms, certain growth factors are
essential for the regeneration of cells in the bone
Greig’s Cephalopolysyndactyly marrow and for tissue repair, whereas other factors
regulate the development and growth of tissues in the
embryo (see ▶Limb Development). In human disease,
Definition growth factors contribute to the abnormal regulation of
Greig’s cephalopolysyndactyly refers to a syndrome cell proliferation found in cancer but also regulate
that affects embryonic development of the limbs, skull cellular processes in cardiovascular and inflammatory
and face. Major features include polysyndactyly of the diseases. The progressive accumulation of new in-
hands and feet, broad thumbs, macrocephaly and a high formation regarding the function of growth factors in
prominent forehead. This disorder is due to mutation of human health and disease is providing new alternatives
the gene encoding the ▶GLI3 zinc finger transcription for future treatment strategies.
factor which is involved in mediation of the ▶Hedge-
hog Signal Transduction Pathway. Characteristics
▶Hedgehog Signalling In 1986, Rita Levi-Montalcini and Stanley Cohen were
awarded the Nobel Prize in Physiology or Medicine for
their discoveries of the first identified growth factors,
nerve growth factor (▶NGF) and epidermal growth
factor (▶EGF). NGF regulates the development and
survival of neurons whereas EGF stimulates cell
Growth Factor Receptors proliferation in a number of different cell types. Ever
since, numerous other growth factors have been
identified and their functions in human development,
Definition homeostasis and disease explored. These discoveries
Growth factor receptors are cell surface molecules that have to a large extent been promoted by cancer
bind growth factors, and initiate an intracellular signal research, especially investigations into how viruses
that affects gene transcription in cells. cause tumors. Some viruses that infect cells cause
▶Growth Factors permanent mutations in the host cell DNA, which lead
▶Receptor Serine/Threonine Kinase to the expression of genes that disturb the internal
▶Tyrosine Kinases control of the cell’s ability to proliferate. These so
Growth Factors 731

called ▶oncogenes can promote abnormal cell pro- and selectively stimulate proliferation of endothelial
liferation, a hallmark of cancer cells, by changing the cells to recruit new vessels into the anoxic tissue.
function of any protein involved in the normal control VEGF can also be released from cancer cells and
of cell replication. Thus oncogenes can lead to stimulate growth of vessels from the surrounding
abnormal expression of a particular growth factor, tissues, which ensures a blood supply in the growing
altered expression of growth factor receptors, increased tumor. PDGF is stored in blood platelets and can act on
growth factor receptor activity or any other disruption cells after platelet degranulation in association with
of the intracellular machinery that regulates cell tissue injury, bleeding and blood clotting. Together
division. These insights have led to the discovery of with other growth factors such as EGF, which is
individual growth factors, growth factor receptors and also stored in platelets, PDGF acts locally on cells in
intracellular signaling pathways that transmit growth the injured tissue and promotes cell proliferation in the
stimuli to the cell nucleus. These naturally expressed healing process (Fig. 1). Insulin-like growth factor-1
growth regulatory proteins are called proto-oncogenes (▶IGF I) chemically resembles the hormone insulin
and historically, they have been named after the virus and is mainly produced in the liver under the control of
that gave rise to the growth disturbing mutation. For growth hormone. During development, this growth
example, the proto-oncogene that encodes for platelet- factor participates in the regulation of skeletal growth G
derived growth factor (▶PDGF) was named sis after and maturation after birth, but it is also involved in
the virus simian sarcoma virus, which induces tissue repair and may be important for abnormal
proliferation in infected cells through over-expression proliferation of cancer cells. The transforming growth
of a PDGF-like protein. factor family (▶TGF) is a large family of different
The polypeptide growth factors include a wide variety growth factors that were initially characterized by their
of signaling molecules that can be categorized into ability to transform normal cells into tumor cells in
several groups or families. They can be produced and culture. They have profound effects on cell metabolism
secreted by cells in order to act locally on neighboring and cellular synthesis of extracellular matrix proteins,
cells (▶paracrine function) or actually even on the and in some cell types they rather prevent than
same cells that produced them (▶autocrine function). stimulate cell proliferation. ▶Cytokines are a unique
Some growth factors are also produced in organs but family of growth factors, which primarily act on cells in
exert their action on target cells after being transported the immune system and stimulate proliferation of
in the blood to a distance from the source (▶endocrine lymphoid cells. Cytokines such as the interleukins
function). (▶IL), a large family with more than 20 members,
One of the first growth factors characterized, EGF, is regulate proliferation on a variety of lymphocytes but
found in salivary glands in the gastrointestinal tract and also affect differentiation and growth of cells in the
promotes proliferation of a large variety of cells, bone marrow.
epithelial cells and mesenchymal cells included. The identification and characterization of growth
▶Erythropoetin is a growth factor produced in the factors have yielded potential therapeutic tools for the
kidney, which stimulates proliferation of immature red management of a large variety of human diseases. In
blood cells in the bone marrow. Some growth factors patients with chronic kidney failure, the metabolic
are stored extracellularly in tissues or in cells and can dysfunction of the organ will lead to anemia due to
be released to stimulate cells in the immediate vicinity. insufficient production of erythropoietin. Today, pa-
For example, fibroblast growth factor (▶FGF), a tients suffering from this condition receive injections of
member of a large family of growth factors, has the recombinant erythropoietin that stimulate production
capacity to be stored in tissues by binding to sugar of red blood cells in the bone marrow. A number of
residues on proteins in the extracellular matrix, the so- other conditions where the bone marrow does not
called proteoglycans, and can be released after injury to produce enough blood cells can also be corrected
participate in tissue repair by stimulating cell prolifera- through the addition of specific growth factors. For
tion. Several members of the FGF family stimulate example, deficient production of white blood cells
proliferation of endothelial cells and participate in the in the bone marrow, a side effect of cancer treatment
formation of new vessels, angiogenesis. FGF is also an with chemotherapy, can now be corrected through
important growth factor in the developing embryo and injections of growth factors that specifically stimulate
mutations in receptors for these growth factors have proliferation of leukocytes. In some disorders, espe-
been associated with several different bone disorders, cially in cancer, pharmacological approaches are
for example achondroplasia (dwarfism). There are taken to develop drugs that prevent the effects of
numerous other growth factors that are involved in growth factors, for example drugs that interfere
angiogenesis. Vascular endothelial cell growth factor with the binding between specific growth factors
(▶VEGF) can be produced in tissues with a deficient and their corresponding receptors on the surface of
blood supply, for example after myocardial infarction, cells.
732 Growth Factors

Growth Factors. Figure 1 Growth factors regulate healing responses. After tissue injury, bleeding and
formation of a blood clot, growth factors can be released from degranulating platelets (1) and stimulate fibroblast
proliferation or growth of new blood vessels in the injured tissue. In addition, inflammatory cells recruited from
the blood stream can produce growth factors (2); they may be released from storage pools in the tissue
(3) or they may be released from proliferating cells (4).

Regulatory Mechanisms single transmembrane domain, an extracellular domain


Growth factors are secreted into the extracellular that binds to the ligand and a cytosolic domain, which
environment of cells where they may come in contact harbors an enzymatic activity that catalyzes addition of
with specific receptors on the same cell that secreted phosphate residues to the amino acid tyrosine, ▶tyrosine
them, on nearby cells or on distant cells after kinase activity. Many growth factors, such as PDGF, are
transportation through the blood stream. Polypeptide dimers, two separate polypeptides linked to each other,
growth factors usually bind with high affinity to their which bind to two separate receptor subunits at the cell
corresponding receptors. The receptors are transmem- surface. This dimerization of the receptor subunits
brane proteins with an extracellular domain that has a facilitates tyrosine kinase activity in the cytosolic
specific binding site for the growth factor. Upon domains whereby tyrosine residues are phosphorylated,
binding of the ligand, a signal is sent to the intracellular a phenomenon termed ▶autophosphorylation. Other
domain, which initiates an intracellular signal trans- monomeric growth factors, such as FGF, may instead
mitted from the receptor to the cell nucleus through form pairs by binding to sugar structures at the cell
specific signaling pathways. Once the signal has surface and thereby achieve simultaneous binding of two
reached the nucleus, a specific response is elicited in corresponding cell surface receptors and receptor dimer-
the regulation of gene transcription and the cell will ization. Autophosphorylation of tyrosine residues in the
express specific genes, for example genes involved in cytosolic domains of RTKs allows binding and initiation
the control of proliferation. of catalytic activity in intracellular proteins that transmit
Growth factor receptors can be classified into three signals to the nucleus through different pathways. These
different groups, receptor tyrosine kinases (▶RTK), signaling pathways generally function by a series of
▶tyrosine-kinase-associated receptors, and G-protein kinases that step by step are phosphorylated and activated
coupled receptors (▶GPCR). Most growth factor recep- one after the other so that the signal is propagated to the
tors are RTKs. These growth factor receptors have a final activation of gene expression in the nucleus (Fig. 2).
Growth Factors 733

Growth Factors. Figure 2 Growth factors regulate gene transcription in cells. 1) Binding of a growth factor to a
corresponding receptor (receptor tyrosine kinase; RTK) on the cell surface induces dimerization of two receptor
subunits, which elicits phosphorylation (P) of tyrosine residues on the cytosolic domains of the receptor. 2) Specific
intracellular adaptor proteins (AP) bind to the receptor and facilitate activation of catalytic proteins, such as Ras,
which therafter activate intracellular signaling through the mitogen activated protein kinase (MAPK) pathway. 3) In
this pathway, phosphorylation of amino acid residues activates a series of kinases and the last activated protein in
the pathway enters the cell nucleus (4) where it induces activation of transcription factors (tf), which promote
expression of specific genes, for example genes coding for proteins necessary for cell proliferation.

Cytokines bind to tyrosine-kinase-associated receptors tyrosine residues on the receptor, which provides
that are structurally similar to RTKs but lack intrinsic binding sites for downstream signaling molecules and
tyrosine kinase activity in their cytosolic domains. initiation of signaling cascades. The receptors of the
Instead, these growth factor receptors are associated GPCR family are traditionally regarded as mediators of
with molecules that have tyrosine kinase activity. signaling for substances that regulate specific physio-
Receptor binding of cytokines also facilitates dimeriza- logical responses in differentiated cells, for example
tion of receptor subunits, which then mediate phosphor- contraction and relaxation in muscle cells. Lately, it has
ylation of associated tyrosine kinases. Activation of been understood that these receptors may also transmit
these tyrosine kinases leads to phosphorylation of growth signals in less differentiated and proliferative
734 Growth Hormone Deficiency

cells after stimulation by factors not normally perceived changes in the expression and activity of families of cell-
as growth factors, for example angiotensin II which cycle regulatory proteins termed ▶cyclins and cyclin
stimulates contraction of smooth muscle cells in the dependent kinases (▶cdk). The expression of some
vessel wall but also initiates proliferation in de- cyclins is largely dependent on growth factors and
differentiated smooth muscle cells in diseased vessels. activation of growth factor receptors. After growth factor
GPCRs have an extracellular domain that binds the stimulation, cyclins are synthesized, which form com-
ligand, seven transmembrane segments and an intracel- plexes with cdks. These complexes catalyze massive
lular domain that associates with a guanine nucleotide- phosphorylation of the retinoblastoma protein, ▶Rb. In
binding protein (hence the name G-protein). Ligand resting cells, Rb is bound to the transcription factor E2F
binding, for example of thrombin or angiotensin, to its but upon hyperphosphorylation, E2F is released and can
specific GPCR extracellular domain, leads to a con- activate transcription of a number of genes necessary for
formational change in the cytosolic domain of the S-phase initiation and further cell-cycle progression.
receptor. The altered receptor structure facilitates
binding of G-proteins to the intracellular domain, which
in turn activates an enzyme attached to the plasma References
membrane. This enzyme catalyzes a reaction leading to 1. Oxford Reference Online. Oxford University Press.
the release of a second messenger, which can then BIBSAM, 2004. ▶http://www.oxfordreference.com
activate intracellular signaling molecules that reach the 2. Bast et al (2000) Cancer Medicine, 5th edn. BC Decker
nucleus and affects gene transcription. Inc, Hamilton
3. Lodish et al (2003) Molecular cell biology, 5th edn. WH
Cascades of intracellular signaling molecules constitute Freeman and Co, New York
links between growth factor binding to a growth factor 4. Alberts et al (1994) Molecular Biology of the Cell, 3rd
receptor and expression of genes in the nucleus that edn. Garland Publishing, New York and London
control cell proliferation. The ▶mitogen activated 5. Heldin and Purton (1996) Signal transduction, 1st edn.
protein kinase (MAPK) pathways are the most well Chapman & Hall, London
studied and understood intracellular signaling path-
ways, which are activated after binding of growth
factors to their corresponding cell surface receptors.
For example, binding of PDGF to the PDGF receptor
leads directly to receptor dimerization and autopho-
sphorylation of tyrosine residues on the cytosolic
domains of the receptor. Tyrosine autophosphorylation Growth Hormone Deficiency
provides binding sites for an adaptor protein called
Grb2. Grb2 in turn binds to another adaptor protein SoS
(son of sevenless) that can activate the small GTP Definition
binding proto-oncogene protein Ras. Ras then phos- Growth hormone deficiency is caused by absence,
phorylates a Map-kinase-kinase-kinase, which in turn decreased production or dysfunction of the anterior
phosphorylates a MAP-kinase-kinase and then finally, pituitary hormone, which results in dwarfism or short
the last signaling molecule of the pathway, a MAP- stature and possibly some metabolic abnormalities
kinase (MAPK) is activated. After phosphorylation, (such as hypoglycemia).
MAPK translocates into the nucleus and activates a set ▶Hypothalamic and Pituitary Diseases Genetics
of ▶transcription factors, which promote expression of
growth-related genes (Fig. 2).
The ▶cell cycle is divided into four phases, G0, G1, S, G2
and M. Non-malignant eukaryotic cells are normally
resting in the G0 phase. In cells that harbor the capacity to
proliferate, such as fibroblasts, growth factors stimulate
the cells to leave the G0 phase and enter the G1 phase.
When cells have been stimulated with growth factors for a Growth Plate
specific time period in the G1 phase, further progression
in the cell-cycle and cell division will inevitably follow,
even if the growth factor is removed. This so called Definition
▶restriction point is followed by the S-phase where DNA The growth plate is a cartilaginous structure at the end
replication takes place, the G2 phase when factors of bones that generates the entire longitudinal growth
necessary for the physical division of the cell are through proliferation and differentiation of chondro-
produced and finally the M phase when mitosis occurs. cytes, and the conversion of cartilage into bone.
Passage through the cell-cycle is controlled by periodic ▶Bone Disease and Skeletal Disorders, Genetics
Guanine Nucleotide Exchange Factors 735

Gamma-Secretase (Complex) GTPase-Activating Proteins

▶γ-Secretase Definition
GTPase-activating proteins (GAPs) comprise of pro-
teins that bind to a GTP-binding protein and inactivate
it by stimulating its GTPase activity so that it
hydrolyzes its bound GTP to GDP.
▶Rho, Rac, Cdc42
GSK3

▶Glycogen Synthase Kinase–3


GTPases
G
Definition
GST GTPases comprise of a large group of intracellular
signaling proteins characterized by an active state when
GTP-bound, and an inactive state when GDP-bound.
Definition Their enzymatic activity hydrolyzes GTP.
▶Cardiac Signaling: Cellular, Molecular and Clinical
GST stands for Glutathione-S-transferase, a 26.9 kDa
Aspects
protein-fragment that is used to tag recombinant proteins.
▶Diabetes Insipidus, a Water Homeostasis Disease
▶Recombinant Protein Expression in Bacteria
▶Rho, Rac, Cdc42

GST Pull-Down Experiment Guanine Nucleotide Dissociation


Inhibitors
Definition
In this kind of assay, a recombinant affinity tag ▶fusion Definition
protein is used as bait to capture (‘pull down’) binding Guanine nucleotide dissociation inhibitors (GDIs) are
partners out of a cell lysate. The cell lysate is applied to the protein factors that inhibit the dissociation of guanine
immobilised bait protein, or, alternatively, bait protein and nucleotides (GDP/GTP) from GTPbinding proteins.
lysate are mixed in solution and complexes are captured ▶Rho, Rac, Cdc42
by affinity chromatography afterwards. Glutathion-S-
transferase (GST) is often used as the affinity tag.
▶Affinity Chromatography and In Vitro Binding
(Beads)
Guanine Nucleotide Exchange Factors

Definition
Guanine nucleotide exchange factors (GEFs) are
GTP proteins that catalyze the release of guanine nucleotides
(mostly GDP) from monomeric or heterotrimeric
GTPases, thereby allowing them to bind GTP in its
Definition place. In the latter case, heptahelical receptors serve as
GTP stands for Guanosine 5′–triphosphate. It is GEFs.
produced by phosphorylation of GDP (guanosine 5′– ▶G-Proteins
diphosphate). ▶Rho, Rac, Cdc42
▶Rho, Rac, Cdc42 ▶Tight Junctions
736 Gut Epithelium

adherent junctions) link the cells together, regulate


Gut Epithelium the barrier function, and delineate two compart-
ments of the plasma membrane, the apical and the
basolateral membranes. These complexes are com-
M ICHÈLE K EDINGER , J EAN -N OËL F REUND posed of at least 40 different proteins, among which
INSERM U682, Strasbourg, France are transmembrane proteins such as occludin,
michele.kedinger@inserm.u-strasbg.fr claudins and cadherins, and cytoplasmic proteins,
among which are some which can shuttle to the
nucleus like the transcription factor ZONA B
Definition associated to ZO-1 and β-catenin which connects
The intestinal epithelium lines the gut lumen, dividing E-cadherin to actin filaments. The apical membrane,
the outside from the interior of the body. It is implicated called the brush border, is formed by microvilli
in most digestive and absorptive functions of the gut, sustained by an actin core associated to a number of
but also in the immunological and non-immunological cytoskeletal proteins among which is villin, which
protection against nutritional, bacterial, viral and regulates the plasticity of the brush border. The
parasite aggressions. Its importance in terms of functional proteins (digestive enzymes, outside-in
evolution is demonstrated by its presence early in the transporters) are anchored into the brush border
animal kingdom long before the spinal cord or neuronal membrane. The basolateral membrane carries in-
cells. The intestinal epithelium forms invaginations side-out transporters and ion channels involved in
into the stroma, the crypts, and outwardly projecting the processing of nutritional metabolites, but also
villi in the small intestine or flat cuffs in the colon. receptors to basement membrane molecules. These
Taking into account the length of the gut (5–6 m in receptors, including ▶integrins and ▶ dystroglycan,
humans), these structures, and additional microscopic ensure the link with the basement membrane and
foldings at the apical membrane of epithelial cells, the also act synergistically with growth factor receptors
microvilli, provide an overall absorptive surface of to initiate intracellular signaling and to participate in
about 300 m2. The gut epithelium exhibits a morpho- the regulation of cell functions.
logical and functional regionalization along the prox- 3. The gut epithelium is composed of several func-
imo-distal axis, mainly obvious between the small tionally different cell types.
intestine and colon. The main cell type in the small intestine (about
90%) is a columnar absorptive cell responsible for
Characteristics the digestive functions, the enterocytes. Additional
cytotypes belong to the secretory cell lineages and
1. The gut epithelium develops in interaction with comprise (i) mucus-producing goblet cells, which
multiple partners. produce the protective mucus layer on top of the
The intestinal epithelium derives from the mid and epithelium, (ii) entero-endocrine cells composed of
posterior parts of the definitive endoderm (midgut and at least 15 different cell subtypes classified on the
hindgut), which itself originates from the ingression of basis of their hormonal content, which control
primitive streak cells at gastrulation. Specification of important physiological functions such as glycemia,
the presumptive intestinal endoderm is driven by exocrine pancreatic secretion and growth/repair of
interactions with lateral mesoderm. Morphogenesis of the gut epithelium and (iii) Paneth cells, which are
the gut generates the single layer columnar epithelium involved in the mucosal defense function and
laid on and separated from lamina propria myofibro- produce a variety of antibiotic and antimicrobial
blasts by a ▶basement membrane, which is a factors. The last cytotype is composed of M cells,
specialization of the ▶extracellular matrix. Intestinal which correspond to a specialization of absorptive
epithelial cells also interact with lymphocytes (either cells facing lymphoid nodules and participate in the
intra-epithelial lymphocytes or organized into sub- control of intestinal immunity and tolerance. Mucous
jacent nodules), and with neuronal cell processes. cells represent the major cell type in the colon.
On the luminal side, nutrients and the intestinal 4. The gut epithelium is continually and rapidly
microflora are able to signal and modulate the renewed throughout life.
epithelial cell behavior. The adult intestinal epithelium is compartmenta-
2. The gut epithelium is highly polarized. lized. Stem cells confined near the crypt bottom
Intestinal epithelial cells exhibit an asymmetrical generate proliferative transit amplifying cells, which
morphology shaped by a highly organized network cycle every 12 h, and migrate vertically towards the
of microtubules, intermediate filaments and actin crypt mouth. When leaving the crypts, epithelial
filaments connected to the plasma membrane. cells abruptly stop proliferating and become differ-
Apical-lateral ▶junctional complexes (tight and entiated. All the differentiated cell types – except the
Gut Epithelium 737

Paneth cells, which migrate downwards to the crypt APC leads to inappropriate activation of the Wnt/
bottom – migrate to the villus tip in the small β-catenin signaling pathway (3) and to chromoso-
intestine or to the surface of the colon cuffs, and mal instability linked to altered kinetochores. Rare
subsequently undergo apoptosis and exfoliation into cases of FAP without germ line mutation of APC are
the gut lumen. The overall life cycle of the gut associated to biallelic germ line mutations in the
epithelial cells is around 5 days in humans. base excision repair gene MYH. The sporadic form
5. Multipotent stem cells are maintained in the adult of colorectal cancer is mainly related to chromoso-
gut epithelium. mal instability and to gradual histological changes,
The continuous renewal of the gut epithelium the adenoma-carcinoma sequence, associated to the
implies the presence of stem cells located in a accumulation of somatic alterations in a number of
specific niche near the crypt bottom surrounded by tumor suppressor genes (APC, p53) and oncogenes
sub-epithelial myofibroblasts and specific extracel- (K-ras, Bcl2). Malignant tumors occur almost
lular molecules. Although intestinal stem cells have exclusively in the colon and rectum, while atypical
still not been isolated, several experimental ap- tumors develop in the small intestine in the context
proaches – transgenesis, mouse embryo aggregation of chronic inflammatory bowel disease. The pri-
chimeras, mutagenesis, regeneration after X-ray mary incidence of tumors in the colon correlates G
irradiation – have provided evidence that all the with the colon specific expression of the anti-
differentiated cell types of the gut epithelium derive apoptotic gene Bcl2 at the stem cell positions in the
from ▶multipotent stem cells (1, 2). Stem cells crypt base.
undergo asymmetrical division to stochastically
produce one new self-maintaining stem cell and
Regulatory Mechanisms
one daughter cell that cycles and fuels a population
of potential clonogenic stem cells. These cells are 1. Intestinal epithelial identity and homeostasis: in-
further displaced into the compartment of transit volvement of the ▶Cdx genes.
amplifying cells that eventually go into the ▶Homeobox genes belong to a large family of
differentiation process. Potential clonogenic stem transcription factors acting at multiple levels during
cells, unlike transit amplifying cells, can replace true embryonic development. Cdx1 and Cdx2 are two
stem cells if this population is altered. The process paralogue homeobox genes, which are expressed in
towards differentiation uses two minor pathways the presumptive gut endoderm and in posterior
consisting of long-lived progenitors of absorptive organs in the embryos, and specifically in the gut
and of mucous cells respectively and one major epithelium throughout adult life. Cdx2 displays the
pathway in which short-lived progenitors can homeotic function devoted to defining the intestinal
produce both absorptive and mucous short-lived identity (4). Indeed, ectopic expression of Cdx2 in
progenitors. Interestingly, glucagon-like peptide-2 the stomach epithelium – normally devoid of Cdx
produced by one of the enteroendocrine cell types – gene expression – converts gastric mucosa to an
known to prevent intestinal damage and to facilitate intestinal phenotype, whereas loss of expression in
repair – signals through enteric neurons to produce a the gut endoderm leads to a gastric transdifferentia-
mediator that stimulates the production of long- tion. Cdx1 and Cdx2 are also modulators of cell
lived progenitors of the absorptive lineage (2). renewal via distinct and complementary functions;
6. Malignant epithelial tumors develop in the colon Cdx1 stimulates cell proliferation, resistance to
and rectum. apoptosis and eventually cell differentiation,
Colorectal cancer is a major disease in terms of whereas Cdx2 reduces cell proliferation and stimu-
incidence and malignancy. It results from imbal- lates differentiation. The Cdx1 and/or Cdx2 proteins
anced cell proliferation, differentiation, migration act directly on a panel of target genes and cellular
and apoptosis in colonic crypts. The vast majority of functions including regulators of cell cycle and
tumors are of sporadic origin while a small apoptosis (p21WAF, Bcl2), transcription factors
proportion is familial. The major familial form, (KLF4), proteins involved in cell interactions (LI-
▶Hereditary Non-Polyposis Colon Cancer cadherin), in calcium metabolism (vitamin D
(HNPCC), is characterized by microsatellite DNA receptor, calbindin-D9k) and in glucose metabolism
instability resulting from germ line mutations in the (glucagon, glucose-6-phosphatase), digestive en-
MLH1 or MSH6 genes, which cause defects in zymes (sucrase, lactase) and mucus production
the DNA mismatch repair system. The second (Muc2). Interestingly, the expression profiles of
familial form, ▶Familial Adenomatous Polyposis Cdx1 and Cdx2 are altered in colorectal cancers
(FAP), is characterized by chromosomal instability and pro-oncogenic pathways have opposite effects
and is linked to germ line mutations in the tumor on the two homeobox genes. For instance Cdx1
suppressor gene ▶APC. Loss of function of is upregulated by the Wnt/β-catenin pathway,
738 Gut Epithelium

whereas Cdx2 is down-regulated by the ▶PI3K/Akt BMP (Bone Morphogenetic Protein) signaling and
pathway. These observations suggest that Cdx1 and Hh (Hedgehog) signaling also control the prolifera-
Cdx2 dysfunctions contribute to cancer progression. tion/differentiation equilibrium of the gut epithelium
In accordance with this, the alteration of the Cdx2 (7, 8, 9).
status sensitizes the colon epithelium to carcinogen- 3. Molecular determinants of gut stem cells and
esis, linked to a lower capacity to switch on progenitors.Although gut stem cells and progenitors
apoptosis. Thus, in addition to its homeotic function have been described near the crypt base, molecular
during embryonic development of the digestive markers remain elusive. Yet, a gene product, Msi1,
tract, the Cdx2 homeobox gene is a gut-specific has been reported in intestinal crypts, specifically in
tumor suppressor gene in the adult colon. individual cells that display the theoretical location
2. The Wnt/β-catenin signaling pathway. attributed to stem cells (1). Msi1 is an RNA-binding
Cell growth and polarity are major attributes of the protein associated with asymmetrical division of
gut epithelium, and the Wnt/β-catenin pathway neural progenitors. Msi1 controls the translation of
plays a pivotal role in this balance. β-catenin several RNAs; in particular, it acts as a translational
contributes to cell polarity by cross-linking the repressor of numb, an antagonist of Notch signal
membrane E-cadherin to the actin cytoskeleton. In activation. The Notch pathway is involved in the
differentiated epithelial cells, an excess of β-catenin maintenance of an undifferentiated state by a lateral
molecules not coupled to E-cadherin are loaded on a inhibition mechanism by which a cell differentiating
complex comprising APC, Axin and the CKI and along a given pathway produces a signal that
GSK3β kinases that phosphorylate β-catenin and prevents neighboring cells from differentiation
target it to the proteasome degradation system. along the same pathway. Upon activation by their
During intestinal development as well as in the ligands, the intracellular domains of Notch receptors
crypts, the Wnt/β-catenin signaling pathway is are released and translocated into the nucleus where
activated by secreted morphogens of the Wnt family they bind to CSL (CBF1/Suppressor of hairless/
that bind to Frizzled receptors and activate several Lag1) DNA binding proteins. This results in the
downstream pathways, one of which leads to the transcriptional activation of downstream targets
inhibition of GSK3β activity. In this context, β- encoding ▶bHLH transcription factors of the Hes
catenin escapes degradation and translocates into family, among which is Hes1. These are negative
the nucleus to bind HMG-box transcription factors regulators of differentiation, by repressing other
of the Tcf/Lef family. These factors play a major role bHLH genes that promote differentiation. Indeed, a
in the maintenance and self-renewal of the stem cell series of gene invalidations led to the conclusion
stock, since Tcf4-deficient mice lack proliferative that cell fate commitment in the intestine depends
cells in the prospective intervillous regions; in essentially on a genetic cascade controlled by bHLH
contrast over-expression of Lef1 causes increased transcription factors. Firstly, Hes1, which is ex-
stem cell apoptosis. Major targets of the activated pressed in crypt cells like Math1 and Ngn3,
Wnt/β-catenin pathway are the proto-oncogene c- maintains the precursor pool expansion and pre-
myc and cyclin D1, which subsequently down- vents premature endocrine and mucous cell differ-
regulates the cell cycle inhibitor p21WAF (5). entiation. Secondly, Hes1 antagonizes Math1,
Interestingly, a transcriptomic approach has identi- which is required for the commitment of the three
fied a set of genes of the c-myc cascade in the stem secretory cell lineages of the gut epithelium (goblet,
cells/progenitors compartment, including regulators endocrine and Paneth cells). This suggests that the
of c-myc gene transcription and protein stability and choice between absorptive and secretory cell
c-myc downstream targets (6). As mentioned above, lineages is balanced by Hes1 and Math1 (2).
a direct Wnt/β-catenin tissue-specific target is Cdx1, Thirdly, within the population of Math1 expres-
and an indirect target is Cdx2, through another sing-cells, Ngn3 specifies the endocrine progenitors
HMG-box factor, SOX9. Finally, the Wnt/β-catenin (10), whereas NeuroD, which is expressed in the
pathway also controls crypt cell sorting via the villus cells, is required for the differentiation of a
regulation of combinations of ephrin receptors subset of endocrine cells. Genetic modulations of
EphB2/EphB3 and their ligands EphrinB1/ the Notch pathway have confirmed its influence on
EphrinB2 (5).Thus, crypt formation, cell sorting the maintenance of undifferentiated, proliferative
and the mechanism of stem cell selection appear to cells in the crypts and on intestinal cell lineage
depend on an adequate threshold of β-catenin- specification by acting on the balance between
mediated signaling during normal intestinal home- bHLH factors (11).
ostasis, whereas inappropriate activation of this
In conclusion, the renewal of the gut epithelium has
pathway contributes to colorectal tumorigenesis.
long been recognized as a paradigm in cell biology as
Cooperating with the WNT/β-catenin pathway,
Gut Epithelium 739

regards well-ordered cell proliferation followed by 2. Brittan M, Wright NA (2002) Gastrointestinal stem cells.
differentiation from self-renewing stem cells. Recent J Pathol 197:492–509
investigations have provided new insights into the 3. Giles RH, van Es JH, Clevers H (2003) Caught up in a
Wnt storm: Wnt signaling in cancer. Biochim Biophys
molecular mechanisms implicated in multiple aspects Acta 1653:1–24
of this process, such as the determination of intestinal 4. Freund J-N, Domon-Dell C, Kedinger M et al (1998) The
identity, the cell commitment into differentiated Cdx1 and Cdx2 homeobox genes in the intestine.
lineages. Emerging concepts propose integrative mod- Biochem Cell Biol 76:957–969
els involving the interplay of local and reciprocal 5. Van de Wetering M, Sancho E, Verweij C et al (2002)
stimulatory and inhibitory signals between epithelial The beta-Catenin/TCF-4 complex imposes a crypt
progenitor phenotype on colorectal cancer cells. Cell
cells and the underlying myofibroblasts, to fine-tune
111:241–250
the homeostatic balance between stemness, commit- 6. Stappenbeck TS, Mills JC, Gordon JI (2003) Molecular
ment, proliferation and differentiation (7, 9). Further- features of adult mouse small intestinal epithelial
more, these results enlarge our knowledge of the progenitors. PNAS100:1004–1009
molecular alterations at the basis of malignant 7. He XC, Zhang J, Tong WG et al (2004) BMP signaling
transformation in ▶colorectal cancers. However, the inhibits intestinal stem cell self renewal through
suppression of Wnt-beta-catenin signaling. Nat Genet
exact nature of the gut stem cells and their relationship
36:1117–1121
G
and regulation by neighboring elements of the stem cell 8. Madison BB, Braunstein K, Kuizon E et al (2005)
niche remain to be elucidated. Understanding the Epithelial hedgehog signals pattern the intestinal crypt-
biology of the gut stem/progenitor cells is a challenge villus axis. Development 132:279–289
for the future that should open new avenues in the field 9. Radtke F, Clevers H (2005) Self-renewal and cancer of
of cancer therapy, intestine regeneration and cellular the gut: two sides of a coin. Science 307:1904–1909
therapy of type-1 diabetes. 10. Jenny M, Uhl C, Roche C, Duluc I et al (2002)
Neurogenin3 is differentially required for endocrine cell
fate specification in the intestinal and gastric epithelium.
EMBO J 21:6338–6347
References 11. Van Es JH, van Gijn ME, Riccio O et al (2005) Notch/
1. Booth C, Potten CS (2000) Gut instincts: thoughts on gamma-secretase inhibition turns proliferative cells in
intestinal epithelial stem cells. J Clin Invest 105:1493– intestinal crypts and adenomas into goblet cells. Nature
1499 435:959–963

You might also like