Professional Documents
Culture Documents
Zimdahl-Hübner2006 ReferenceWorkEntry GeneChipTechnologyAndItsApplic
Zimdahl-Hübner2006 ReferenceWorkEntry GeneChipTechnologyAndItsApplic
GAL4–Expression System
Gag
Definition
Definition The GAL4–expression system is a method for directed
Gag is a retrovirus gene that encodes the retroviral gene expression that can be used to misexpress genes in
internal structural proteins ▶MA, CA, and ▶NC, and specific cell types, or tissues, at different times of
some others. development. This system relies on the generation
▶Retroviruses of transgenic lines that carry “activator” or “effector”
constructs. Activator lines express the yeast transcrip-
tion factor, Gal4, under the control of a desired
promoter, whereas effector lines contain DNA binding
motifs for Gal4–(UAS) linked to the gene of interest.
Gain-of-Function Mutations ▶Drosophila as a Model Organism for Functional
Genomics
Definition Definition
Gamma rays are an energetic form of electromagnetic A space introduced into an alignment to compensate for
radiation produced by radioactivity or other nuclear or insertions or deletions in one sequence relative to
subatomic processes such as electron-positron annihi- another.
lation. Gamma rays are often defined to begin at an ▶Protein Databases
energy of 10 keV, although electromagnetic radiation
from around 10 keV to several hundred keV is also
referred to as hard X-rays. Gamma rays are a form of
ionizing radiation; they are more penetrating than either
alpha or beta radiation (neither of which is electro- Gap Junctions
magnetic radiation), but less ionizing.
▶Molecular Imaging
W. H OWARD E VANS
Medical Biochemistry and Immunology, Cardiff
University School of Medicine, Cardiff, Wales, UK
wmbwhe@cf.ac.uk
Gammaretroviruses
Definition
Definition Gap junctions comprise minute regions of the cell’s
Gammaretroviruses refers to the genus of simple plasma (surface) membrane containing arrays of
retroviruses that includes murine and feline leukemia closely packed membrane channels. These channels
viruses. Many species contain oncogenes and cause traversing two aligned membranes separated by a
leukemias and sarcomas. 2–3 nm intercellular gap provide a communication
▶Retroviruses pathway that directly connects cell interiors (Fig. 1).
Communication across gap junctions enables single
cells to co-ordinate, integrate and summate their
metabolic and electrical interactive activities but also
Gammaretroviruses IN (Integrase) results in some loss of independence. Co-operation and
harmonisation of cellular activities in tissues and
organs is vividly illustrated by the ordered contraction
Definition of heart muscle, made possible because the beating of
individual cells is synchronised and summated to the
Gammaretroviruses IN (integrase) is the retrovirus
organ level by interactions facilitated by gap junctions.
virion enzyme, and is a product of the pol gene.
Indeed, each cardiac myocyte can communicate with
▶Retroviruses
about a dozen or more surrounding partner cells. In the
brain, gap junctions enable the synchronisation of the
electrical coupling of neuronal cell networks. In non-
excitable cells, small signalling molecules below 1200
Gamma-Secretase (Complex) daltons are exchanged across gap junctions. All animal
cells communicate directly with each other across gap
junctions, except striated muscle where the cells have
The γ-secretase (complex) is a multimeric complex that fused, spermatozoa and non-nucleated erythrocytes and
is composed of at least four different transmembrane platelets (1).
proteins (presenilin 1 or presenilin 2 (▶PSEN1/
PSEN2); (▶APH–1); nicastrin; PEN–2). The last step Characteristics
of ▶amyloid generation from amyloid precursor protein Gap junctions are built from one of three biochemically
(▶APP) is performed by the γ-secretase complex. The different families of proteins.
γ-secretase is also important in other pathways, as for
example in the cleavage of ErbB4, intra cellular domains ▶Connexins
of Notch, and similar types of proteins. In vertebrates, gap junctions are constructed from
▶Alzheimer’s Disease ▶connexin protein units (Fig. 1). Over 20 different
Gap Junctions 637
Gap Junctions. Figure 1 A gap junction plaque showing diagrammatically the arrangement of paired connexon
hemichannels in the membrane. These dock to generate a channel directly joining two cells. Connexons attach
to the edges of plaques, but they may also function as hemichannels in their own right. In the top box, the
oligomerisation of connexins into hexameric connexons and the various types of gap junctions formed are
illustrated. In the lower box, the topography of a generalised connexin in the lipid bilayer is shown. Bold line
indicates sequences with high homology.
connexins have been found in humans and rodents and Cx38, Cx40, Cx42, Cx43, Cx50, Cx56; Group3 or
various tissues synthesise different and often over- gamma : Cx45, Cx47 and an uncategorised group:
lapping types of connexin (Table 1). They constitute a Cx25, Cx29, Cx30.2, Cx36, Cx39, Cx40.1 and Cx58.
family of proteins displaying about 40% overall amino Cx43 is by far the most widely distributed connexin
acid sequence identity. A widely adopted method of (1). In heart ventricles, myocytes express Cx43
naming individual connexins uses the abbreviation Cx exclusively but myocytes in the atrium and Purkinje
followed by the predicted molecular mass in kD. fibres also express Cx40. Endothelial cells lining the
Increasingly, a prefix may be added to indicate the vascular wall express Cx37 in addition to Cx43, which
species e.g. h, human; m, mouse; zf, zebra fish, etc. is also expressed by the smooth muscle cells control-
When the molecular mass of the connexins is similar, a ling the contraction of arteries. Cx36 is expressed by
decimal point is added e.g. mCx 30.2, mCx 30.3 and neurons in the retina, whereas in the brain astrocytes
mCx 31.1. However, although there is often little size express mainly Cx43 and oligodendrocytes Cx32 and
variation between species, e.g. mCx36 and fish (skate) Cx47. Cells comprising the various layers in the skin
Cx35, there is sometimes greater disparity in the express a range of connexins, especially Cx26, Cx30,
molecular sizes of functionally similar connexins, e.g. Cx31, Cx32, Cx40, Cx43 and Cx45. In liver,
rat Cx46, bovine Cx44 and chicken Cx56 and clearly hepatocytes express Cx26 and Cx32, with their relative
this nomenclature is not entirely satisfactory. Mouse abundance varying between species. Lens fibre cells
and human connexins have been classified phylogen- express Cx46 and Cx50, but epithelial cells enveloping
etically as follows: Group 1 or beta: Cx26, Cx30.3, the lens capsule express Cx43. Up to 196 permutations
Cx31, Cx31.1, Cx32; Group 2 or alpha : Cx33, Cx37, of homologous and heterologous connexin based gap
638 Gap Junctions
junction interactions are possible when cells express wall of the overall channel. The greatest differences in
two ore more different types of connexins (Fig. 1). The amino acid sequences between connexins are found in
relative amounts of each connexin in cells also varies the intracellular loop projecting into the cytoplasm and
during embryonic and tissue development. consisting of 30 amino acids in the smaller connexins,
Connexins span the membrane lipid bilayer four times 50–55 amino acids in Cx33, Cx37, Cx40, Cx43, Cx46,
and in an alpha helical conformation (Fig. 1). The Cx50 and Cx57, 80 amino acids in Cx45 and 100 amino
protein’s amino terminus (a highly conserved sequence of acids in Cx36. The carboxyl terminal tail, often referred
21 amino acids) and carboxyl terminus (a highly variable to as the regulatory domain, varies in length from 16
number of amino acids as described below) and a single amino acids in Cx26 to 75 in Cx32, 156 in Cx43 and 275
intracellular loop face the inside of the cell (Fig. 1). Two in Cx57. Larger connexins are post-translationally
highly conserved amino acid sequences forming a loop modified by phosphorylation of several serine and
(twenty in the first and forty amino acids in the second) threonine residues on the carboxyl tail. The functional
project in a beta sheet conformation from the cell surface consequences of phosphorylation of connexins remain
into the extracellular space; these loops contain three to be defined as well as the key protein kinases and
cysteine residues that are invariable in all connexins phosphatases involved in regulating gap junction
discovered and they are linked to each other by channels. Connexins are not glycosylated, but are
intramolecular disulphide bonds. These extracellular ubiquitinated and acylated.
loop interactions are crucial for they provide a scaffold The genes encoding mouse and human connexins are
that aligns and extends the transmembrane ▶hemichan- located on different chromosomes (Table 1). The
nel domains across the intercellular gap. The amino acid general organisation of connexin genes is similar, and
sequences in these two loops also set the rules governing connexin phylogenetic trees indicate that connexins are
the compatibility of interactions between various con- likely to have arisen by gene duplication. Most
nexins. For example, cells with gap junction channels connexin genes have a first exon containing 5′
made from Cx32 and Cx26 communicate but cells untranslated sequences and a large second exon
making Cx32 do not communicate with those making containing the complete coding region as well as
Cx43. The types of connexin synthesised by cells untranslated sequences. In contrast, the Cx32 gene
thus dictate whether homo-or hetero-philic gap contains two alternative first exons and their expression
junction channels are generated, with different channel is tissue specific. Connexin 36 contains two exons both
pore characteristics and molecular selectivities. The with translated and untranslated sequences with the
third transmembrane domain and parts of the first coding region interrupted by an intron. The Cx45 gene
and second transmembrane domains contribute the has three exons, with most transcripts containing only
exons 2 and 3. Cx40 also has three exons with the third recently in Annelida. The biochemical characterisation
containing the complete coding sequence. mRNA of innexins is awaited.
encoding connexins is subject to transcriptional control Pannexins are electrical junctions in vertebrates that are
and tissue specific promoters (acting via multiple related to innexins. Pannexin 1 and 2 genes are
exons) account for hormonal and pharmacological abundantly expressed in the central nervous system,
regulation of expression. This is important, for especially the hippocampus, olfactory bulb and cere-
example, in the contraction of uterine muscle during bellum; pannexin 1 is confined to white matter. As with
birth when gap junction numbers composed of Cx43 innexins, expression of pannexin transcripts generates
are regulated by oestrogens whereas gap junctions in intercellular channels with similar electrical character-
the heart are not. Post-transcriptional regulation of istics to vertebrate gap junctions constructed of
mRNA levels is especially evident with some con- connexins. Their biochemical characterisation is
nexins, for example Cx32 and Cx26 in liver. awaited.
How are connexins organised into gap junctions? A
hexagonal arrangement in which six connexins interact Assembly and Breakdown of Gap Junctions
and surround a central pore of 2 nm was suggested by Connexins are rapidly degraded; the half-life in heart
the regular packing of the presumed channel units in and other cells is around two to four hours, a figure G
liver gap junctions stained with heavy metals as well as about 10–20× faster than most membrane proteins.
by atomic force microscopy. The most up to date Connexins are co-translationally inserted directly from
structure is based on a three-dimensional electron ribosomes into the endoplasmic reticulum, threading
crystallographic analysis of gap junctions prepared four times into the membrane bilayer to achieve the
from cultured cells over-expressing recombinant Cx43. typical connexin topography (Fig. 1). Connexins show
This model (2) shows the alignment of two composite a proclivity to oligomerise into hexameric hemichan-
hexameric connexin hemichannels at a resolution of nels that exit the endoplasmic reticulum and enter the
0.7 nm in the membrane plane and 2.1 nm in the Golgi apparatus. The hemichannels are maintained in a
vertical plane and it confirms independent biochemical closed configuration to limit continuity between
and physical chemical studies showing the presence of cytoplasmic and lumenal environments since ionic
24 transmembrane alpha helices in each connexon gradients allow cell-signalling responses. The hemi-
hemichannel unit. The hemichannel unit has a dumb- channels are then trafficked in membrane vesicles to
bell shape and is 7 nm wide at the cytoplasmic aspect, the cell’s plasma membrane and attach to the periphery
narrowing to 5 nm at the extracellular aspect. The of pre-existing gap junction plaques, a process that
aqueous channel narrows from 4 nm diameter at the occurs simultaneously with their docking and align-
cytoplasmic entry point to 1.5–2.5 nm depending on ment with hemichannels in the adjacent cell (Fig. 1).
the calcium concentration at the extracellular region Connexins can be tagged at the carboxyl terminus with
where it becomes continuous with the partner hemi- auto-fluorescent proteins such as green fluorescent
channel of a neighbouring cell. The arrangement of the protein or short tetracysteine-containing amino acid
connexin subunits in the gap junction ensures that motifs to which arsenic-containing chemicals that
the intercellular channel is completely insulated from fluoresce at different wavelengths will bind. These
the extracellular ‘gap’. approaches have elegantly demonstrated in living cells
how connexin hemichannels are moved to the plasma
▶Innexins and ▶Pannexins membrane and accrete into gap junction plaques and
Arthropod and vertebrate gap junctions differ in their how gap junction units are internalised (3). Connexins
overall thickness. Invertebrate gap junctions are are transported on 0.5 μm vesicles to the plasma
constructed of a biochemically unrelated class of membrane guided by a microtubular scaffolding and
proteins called innexins. There is no amino acid are removed from the central area of gap junction
sequence homology with connexins, but innexins plaques as larger vesicles that correspond to annular
adopt a similar topography in the membrane with four gap junctions observed in the electron microscope. Gap
transmembrane domains and cytoplasmically oriented junction plaques are built from up to thousands of
amino and carboxyl termini. Innexin transcripts paired hemichannel units that cannot be peeled apart
expressed in Xenopus oocytes form functional gap once formed. Therefore, plaques are internalised into
junctions displaying typical channel gating character- partner cells as complete units and are ultimately
istics. The Caenorhabditis elegans genome contains as degraded in phagosomes or lysosomes. In contrast,
many as 25 innexin genes, with single cells in this incorrectly folded or mutated connexins are transferred
worm expressing more than one innexin transcript. directly from the endoplasmic reticulum for degrada-
Innexins are also present in Drosophila fruit flies tion in proteasomes.
where they feature in the development of the nervous Gap junctions are frequently located next to other
system and intestine. Innexins have also been identified adhesive junctions and it is not surprising that their
640 Gap Junctions
assembly by cells is closely co-ordinated. Connexins previously thought of as biogenetic precursors of gap
interact via their carboxyl tails with zona occludens 1 junctions. However, evidence is building up that they
and occludin, two proteins associated with tight are regulatable entities in their own right, with
▶intercellular junctions, with tubulin, the major functions influenced for example by calcium levels
constituent of microtubules and with catenins, a further inside and outside the cell. These unpaired connexin
class of proteins associated with adhesive junctions. hemichannels were first observed in the horizontal cells
Specific connexins are also detected in ▶lipid raft of catfish retina and their operation has now been
membrane microdomains where they associate with studied extensively (5). Their presence in mammalian
caveolins. cells was demonstrated on the basis of passage of small
dyes across the channels or the detection of electrical
Regulatory Mechanisms currents crossing them. Cells tolerate small numbers
Gap Junction Channels of open connexin hemichannels on the plasma
Measurements of electrical currents across gap junc- membrane but the presence of larger numbers is often
tions in paired Xenopus oocytes synthesising various a pathological consequence of a metabolic insult such
recombinant connexins have shown that opening and as in ischaemia, when cells release ATP across the open
closing (gating) of gap junction channels is determined hemichannels in cardiac myocytes or glutamate in
mainly by a voltage difference between the paired cells astrocytes. Importantly, hemichannels provide a second
and features amino acid sequences in the first to second mechanism for connexin-dependent intercellular pro-
transmembrane regions. In larger connexins, a second pagation of calcium waves that complements the
mechanism is identified involving a “ball and chain” calcium signalling occurring directly across gap
type interaction between amino acids in the intracel- junctions.
lular loop and the carboxyl tail. This chemical gating is
mainly a function of pH with channels closing as pH Modifications in Disease
drops (4). Gap junctions constructed of different Changes in gap junctional communication and espe-
connexins vary in their gating characteristics. Calcium cially mutations in connexin genes have been shown to
ions also regulate gap junctional communication. correlate with a number of human diseases. Mutations
Elevation of calcium in cells generally closes gap in Cx43 are extremely rare and diseases in tissues
junctions. Cell signalling responses such as an increase expressing this connexin mainly involve modifica-
in cytoplasmic calcium induced by mechanical aggra- tions in the abundance of functional gap junction
vation of cells or release of inositol phosphates are channels and the remodelling of pre-existing gap
propagated across gap junctions to neighbouring cells junctions as seen in cardiac hypertrophy and infarction
thus generating a calcium wave. The biochemical and in endothelial dysfunction in arteries. In contrast,
nature of the signal transmitted is not known for sure about 200 Cx32 mutations have been detected in the
but one suggestion is that intercellular transmission of X-linked form of ▶Charcot-Marie-Tooth Disease, a
calcium waves involves the passage of inositol tripho- demyelinating syndrome that leads to degeneration of
sphate through the gap junction channel. Calcium peripheral nerves. The channels constructed of Cx32
waves are also propagated between cells across are located mainly in the paranodal loops and Schmidt-
connexin hemichannels. Here, ATP is released across Landermann incisures of myelinating Schwann cells
the hemichannel and then binds in a paracrine fashion and provide a direct radial diffusion pathway that is
to purinergic receptors on neighbouring cells. In about 300-fold shorter in distance than a circumfer-
contrast, progress in establishing the biochemical ential route. Many missense, frameshift, deletion and
nature of molecules/ions transmitted across gap junc- nonsense mutations lead to a loss of function, many
tions has been slow, probably because its direct nature caused by failure of Cx32 to oligomerise correctly into
has made their interception difficult. All that can be hemichannels and to be targeted in a precise manner
said is such entities are likely to be 1.2 Kd or less and to the plasma membrane. Sometimes, gap junctions
with an ionic radius of below 1 nm. Gap junctions are formed by these mutated connexins but their
should not be regarded as nonselective pores joining operation is faulty. Gap junctions constructed form
cells, for the molecular selectivity for a range of Cx32 in the liver and pancreatic acinar cells (Table 1)
permeants in the context of charge and size is a function are unaffected. Surprisingly, myelination by oligoden-
of the connexin makeup of the channel. For example, drocytes in the central nervous system is unaffected.
gap junction channels distinguish between the passage Mutations in Cx26 and Cx30 in the human inner ear are
of cAMP and cGMP. associated with congenital deafness, a disorder present
in 1 in 1,000 births. Over 50 Cx26 mutations are
Hemichannels known, with one common recessive mutation resulting
Unopposed connexin hemichannels in non-junctional in a severely truncated Cx26 protein. Many other site-
regions of the plasma membrane (Fig. 1) were specific mutations result in trafficking/▶channel
Gatekeeper Genes 641
Definition
References Gastrulation is a morphogenetic process in embryonic
1. Evans WH, Martin PE (2002) Gap Junctions; structure development, during which the three germ layers
and function. Mol Membr Biol 19:121–36 ectoderm, endoderm and mesoderm form. Mesoderm is
2. Unger VM, Kumar NM, Gilula NB et al (1999) Three formed from ectoderm by ▶epithelial-to-mesenchymal
dimensional structure of a recombinant gap junction conversion. The primitive ectoderm divides into surface
membrane channel. Science 238:1176–1180
ectoderm and neuroectoderm.
3. Gaietta G, Deerinck TJ, Adams SR et al (2002) Multicolor
electron microscopy imaging of connexin trafficking. ▶Neural Development
Science 29:503–507
4. Harris AL (2001) Emerging issues of connexin channels:
biophysics fills the gap. Q Rev Biophys 34:352–472
5. Paul DL, Goodenough DA (2003) Beyond the gap:
functions of unpaired connexon channels. Nat Cell Biol
4:285–294
6. White TW, Paul DL (1999) Genetic diseases and gene
Gatekeeper Genes
knockouts reveal diverse connexin function. Annu Rev
Physiol 61:283–310
Definition
Gatekeeper genes are involved in the control of cell
cycle progression, lifespan of a cell or cell death. They
are often a target for mutations during cancer develop-
ment. The ▶APC gene is the major gatekeeper of the
GAPs colon.
▶Cell Cycle - Overview
▶DNA-Repair Mechanisms
▶GTPase-Activating Proteins ▶Tumor Suppressor Genes
642 Gating
Gating Gelsolin
Definition Definition
Gating refers to the conformational changes of ion Gelsolin and gelsolin-like proteins are ubiquitous F-
channels during the opening and closing of the actin fragmenting proteins. They perform three major
permeation pathway. functions: (1) They sever actin filaments to smaller
▶Ion Channels/Excitable Membranes fragments; (2) They cap free barbed ends thus
inhibiting elongation; and (3) They nucleate actin
polymerisation by stabilising dimers and trimers. Many
of these proteins are regulated by Ca2+ and the
phosphoinositide PIP2.
▶Actin Cytoskeleton
GBP/FRAT
Definition Gemins
GBP (GSK–3 binding protein), and its mammalian
homologue FRAT, bind to the serine-threonine kinase
glycogen synthase kinase 3 (GSK3), and inhibit its Definition
phosphorylation of non-primed GSK–3 substrates. It Gemins are proteins that associate with the ▶SMN
has tumor promoting activity in lymphocyte. (survival of motoneuron) complex in a stable and
▶Wnt/Beta-Catenin Signaling Pathway stoichiometric manner. As these proteins and SMN
colocalize in nuclear structures called “gems”, they are
referred to as “Gemins”.
▶Spinal Muscular Atrophy
GDIs
GenBank
▶Guanine Nucleotide Dissociation Inhibitors
Definition
GenBank, the NIH genetic sequence database, is an
annotated collection of all publicly available DNA
sequences, located at http://www.ncbi.nlm.nig.gov. It is
part of the International Nucleotide Sequence Database
Collaboration, which is comprised of the DNA
GEF DataBank of Japan (DDBJ), the European Molecular
Biology Laboratory (EMBL), and GenBank at NCBI.
▶Protein Databases
▶Guanine Nucleotide Exchange Factor
Gene
and intervening (intronic) sequences. Nomenclature: availability of sequence data, whether related genomes,
Human genes are abbreviated using italic capital letters complete cDNAs or expressed sequence tags (ESTs).
and numbers without spaces, dashes etc. according to
the approved gene symbols listed in the Human Gene Definition
Nomenclature Database (http://www.gene.ucl.ac.uk/ All prokaryotic and eukaryotic organisms encode their
cgi-bin/nomenclature/searchgenes.pl). Nomenclature genetic information in their genome, built up from
for murine gene symbols underlies the same rules, DNA. The process of genome annotation consists of
but uses lower case letters after the first letter. Example: finding and decoding the information encrypted on the
PSEN1 = human gene for PS1; Psen1 = murine gene DNA molecules into known conceptual objects related
for PS1. to biological entities and functions. In general,
annotation is mostly focused on finding genes, defining
their structure and assigning a function to the product
and the process resulting from the expression of each
gene. But annotation is not restricted to genes, as non-
coding RNAs, transposable elements, promoters and
Gene Annotation in Plants
enhancers also make up the genome and are essential to G
an understanding of the organization and the various
functions encoded by or embedded in genomes.
S. R OMBAUTS , Y. VAN DE P EER , P IERRE R OUZÉ Characteristics
Department of Plant Systems Biology, University of Gene annotation in plants is in essence not different
Ghent, Ghent, Belgium from gene annotation in human or mouse, except that
pierre.rouze@psb.ugent.be each genome although constituted by the same DNA
has its own style that needs to be captured by the
models used by the different prediction programs.
Synonyms Anticipating which are the genome specific character-
Just as for gene and genome annotation in any other istics, e.g. codon usage, gene density, length and
organism, gene annotation in plants makes use of composition of introns and intergenic sequences, as
algorithmic approaches based on statistics, artificial well as conservation of signals such as splice sites
intelligence and machine learning, recently complemen- (Table 1) are all important for building adequate
ted with homology based approaches depending on the algorithms and for their proper training.
HMM, hidden MM; MDD, maximal dependence decomposition; NN, neural networks
644 Gene Annotation in Plants
The effort undertaken by several laboratories world- other hand genomes are too distantly related, only a few
wide to sequence the Arabidopsis thaliana genome genes will keep significant similarity to be correctly
led, by the end of 2000, to the first full catalogue of modeled or even be found. For plants, genome
genes present in a plant. This work also revealed that sequences of one dicot, Arabidopsis, and one monocot,
plants have about the same number of protein gene loci rice are presently available. Unfortunately they
as vertebrate genomes (27,000 for Arabidopsis) but diverged some 200 million years ago and are therefore
also that about 5,000 genes remained unknown, too divergent for comparative annotation. Other
showing no homology to any known sequence in genomes that are currently being sequenced, poplar,
databases (1). These genes could only be predicted Medicago and Lotus for dicots, maize for monocots
using ab initio prediction programs and need further will soon fill this gap.
experiment to prove that they truly exist, which has Besides experimental approaches, functional annotation
since been done for a number of them. can only be achieved through comparative methods
Genome annotation can be subdivided in two steps; where the knowledge of genes from one organism can
structural annotation will provide gene structures be transposed to the genes and the genome of the
and functional annotation as an as accurate as organism concerned. To achieve this, homology
possible prediction of the function of each gene. To searches and alignment programs are the main algo-
acquire structural annotation two main approaches rithms used (Table 2). The quality of the databases
are combined, intrinsic and extrinsic methods. Intrinsic providing the data is of primary importance and is
(or ab initio) methods are all those methods that decipher currently the main limitation. The homology searches
genome content based solely on statistical/lexical are performed against protein, domain and motif
models built by using human-curated data sets of databases using BLAST or FastA or by using hidden
sequences from the organism under investigation. The Markov profiles that show a better sensitivity and
algorithms used for intrinsic approaches can be specificity. One point that needs to be stressed is the
subdivided into signal sensors or content sensors. Signal importance of consistency in genome annotation across
sensors are algorithms that focus on the retrieval and species. The gene ontology (GO) project is an attempt to
identification of functional sites such as e.g. splice achieve this goal, through a hierarchical description of
and translation initiation sites, transcription and poly- the genes in a genome according to the functions of their
adenylation signals and include methods such as products at the various biological levels, from the
position weight matrices, neural networks and support molecules to the biological processes and cellular
vector machines. Content sensors recognize regions components with which they are associated. The
along a sequence that have local characteristics differing classification and standardized terminology of GO
from the surrounding sequence, include mainly methods (gene ontology) has been initiated in the annotation of
such as Markov models and all their variants and are the D. melanogaster genome and it is hoped that GO
broadly used to distinguish coding from non-coding will become a community-curated entity, providing a
regions. central frame and vocabulary for annotation.
The purpose of the multiplicity of methods used is to Many gene prediction programs are publicly available;
decompose the problem of gene annotation into key many of them are referred to on the web site maintained
components and achieve the best results on each by W. Li (▶http://linkage.rockefeller.edu/wli/gene/
component, which can thereafter be reassembled programs.html). Several reviews are available on this
towards a whole gene structure (2). topic too, among which that of Mathé et al. (2) is more
Extrinsic or comparative approaches on the contrary specifically oriented towards plants. The large variety
rely on the availability of sequences from other of gene prediction programs have the drawback that
genomes and proteins, where regions that show enough one does not necessarily know which program to use in
similarity between genomes are believed to have the which situation and which performs best depending on
same biological meaning, as a result of common the organism of interest. The issues of specificity, or the
ancestry. The advantage of comparative methods to ability to predict only real genes and sensitivity, or the
predict genes is that it allows the revealing of small and ability to predict all the genes present in a sequence,
novel genes without ambiguities and, more impor- have been addressed by Burset and Guigo (3), Rogic
tantly, enables the detection of non-coding features that et al. (4) and in the specific case of plants by Pavy
can hardly be detected otherwise. One needs never- et al. (5). In those publications it has been made clear
theless to be careful as to the choice of organisms used that programs rarely performed well enough to be able
for comparative methods to achieve good results. Too to predict all the genes and that 30–40% of the genes
closely related genomes might not reveal the informa- were likely to be wrongly predicted even by the best
tion hidden in the genomes, as regions larger than intrinsic methods. In addition, it is clear that the more
coding sequences will remain conserved, making the extrinsic data (of high quality) become available, the
delineation of gene structures impossible. If on the better the annotation will be.
Gene Annotation in Plants 645
Gene Annotation in Plants. Table 3 Ab initio gene prediction programs (possibly with homology integration)
Gene Annotation in Plants. Table 3 Ab initio gene prediction programs (possibly with homology integration)
(Continued)
Program Organism Gene elements Gene model Homology
Genie (44,96) (▶http://www. Drosophila, human, NN GHMM, DP Protein
fruitøy.org/ other
seq_tools/genie.
html)
GenLang (154) (▶http://www.cbil. Vertebrates, Chart parsing, DP
upenn.edu/ Drosophila, dicots
genlang/ Grammar rules,
genlang_home. WAM, hextuple
html) frequencies
GenomeScan Vertebrates Genscan method, GHMM, DP Protein
BLASTP or
BLASTX
GENSCAN (30) (▶http://genes.mit. Vertebrates, WAM for acceptor; GHMM, DP Protein,
edu/GENSCAN. Arabidopsis, maize MDD for donor; 5th GenomeScan
html) order MM (99)
(homogeneous for
introns, three-
periodic for exons)
GENVIEW2 (▶http://l25.itba. Human, mouse, Linear DP
(25) mi.cnr.it/ Diptera combination,
webgene/ dicodon statistic
wwwgene.html)
GlimmerM (▶salzberg@cs. Small eukaryotes, Three-periodic DP
(33,34) jhu.edu) Arabidopsis, rice IMM for exons
(order 0-8), IMM for
introns, 2nd order
MM for splice sites
GRAIL/GAP3/ (▶http://compbio. Human, mouse, NN DP EST, cDNA
GrailEXP ornl.gov/public/ Arabidopsis,
(90,155) tools/) Drosophila
GRPL (97) Human, Drosophila, Reference point GHMM, DP Protein
Arabidopsis logistic for splice
sites, 5th order MM
(homogeneous for
introns, three-
periodic for exons)
HMMgene (98) (▶http://www.cbs. Vertebrates, Three-periodic 4th CHMM
dtu.dk/services/ C. elegans order MM for
HMMgene/) exons, 3rd order
MM for introns
MORGAN (48) (▶http://www.cs. Vertebrates Decision tree DP
jhu.edu/labs/ system
compbio/morgan.
html)
MZEF (26) (▶http://argon. Human, mouse, Quadratic No
cshl.org/ Arabidopsis, fission discriminant
geneænder/) yeast analysis
Gene Annotation in Plants 649
Gene Annotation in Plants. Table 3 Ab initio gene prediction programs (possibly with homology integration)
(Continued)
Program Organism Gene elements Gene model Homology
SORFIND (24) Matrix method for No
start and splice
sites, hexamer
usage (Fourier
measure)
Twinscan (100) Mouse, human Genscan method; GHMM Genomic
5th order MM for sequence
UTR and inter-
genic, WAM for
acceptor sites
VEIL (47) (▶http://www.cs. Vertebrates HMM DP
jhu.edu/labs/ G
compbio/veil.html)
Xpound (156) (▶http://bioweb. Human Three-periodic 1st HMM
pasteur.fr/seqanal/ order MM for
interfaces/ exons, 1st order
xpound-simple. MM for introns and
html) intergenic
of plants to produce biopharmaceuticals is much more have many potential advantages for the production of
recent. Identification and mining of genes involved in recombinant proteins and the engineering of pharma-
the metabolism of such compounds in medicinal plants ceuticals. First, growing plants is more economical than
and in model systems is then an important issue. Gene industrial facilities with bioreactors. Second, starting
annotation is the basic step for it, and its high quality can from plant material is already documented. Third,
speed up and better focus the research on genes of purification is not required when the therapeutic product
potential pharmaceutical importance, especially the can be administrated as food. Fourth, plants can be
ones involved in secondary metabolism. directed to target proteins into organs and intracellular
Besides being a source of pharmaceuticals, plants are compartments that confer better and more stable
also seen as a convenient and inexpensive way to conservation (e.g. seeds). Fifth, the production levels
produce proteins and other molecules of medicinal that can be reached using modified plants approaches
interest, a practice often referred as “pharming”. Most industrial scales. Last, the risk of human health threats
genes can indeed be expressed in a wide range of due to contamination is reduced to a minimum.
organisms. Therefore expression systems need to be The first recombinant therapeutic proteins, successfully
tested for efficiency, cost and the biological activity of produced in tobacco plants were different forms and
the products as the demand for high production levels parts of immunoglobulin (Ig), making plants virtually
at low cost is important to make modern medicine unlimited sources of inexpensive monoclonal antibo-
available for an ever expanding world population. dies. Ig produced in plants can effectively prevent
Modified mammalian cells are in that respect valuable infectious diseases and cancers in the mouse model or
as far as the biological activity is concerned but their be used for in vivo tumor imaging. None are available
use is limited because of expensive culturing and commercially yet.
difficult scaling up. The advantage of microbial Glycosylation of proteins though remains a potential
organisms is that as well as the easier modification of problem, as N-glycans in plants are structurally more
the organisms, larger quantities can be manufactured diverse with a majority of oligosaccharides having
using industrial bioreactors. Their disadvantage is that β-(1,2)-xylose and α-(1,3)-fucose linked to the Man3-
proteins do not become correctly glycosylated for GlcNAc2 core. These are not found in mammalian
usage in humans and that some proteins lack the proper N-glycans, while plant engineered proteins are lacking
folding and disulfide bridges. Plants on the other hand, the sialic acid that represents 10% of the mouse sugar
650 Gene Chip Technology
Gene Chip Technology and Its Application to Molecular Medicine. Figure 1 Schematic representation of the
experimental strategy of cDNA and oligonucleotide arrays (modified according to Schulze A, Downward J, Nat Cell
Biol 2001; 3:E190-E195(36). cDNA arrays: For the array preparation, inserts from cDNA clones (libraries) are
amplified and PCR products are spotted onto glass slides or nylon membranes at specific positions using arraying
robots. Target preparation: RNA from the two tissues or cell populations under comparison is used to synthesize
cDNA in the presence of either radioactively labeled nucleotides or nucleotides labeled with two different fluorescent
dyes, Cy3 and Cy5, during cDNA synthesis. Samples labeled with two different fluorescent dyes are mixed and are
hybridized to the array, whereas samples labeled radioactively (or with one fluorescent dye) are hybridized to
separate arrays. Signal intensity ratios are obtained by comparing either two different signals (Cy5/Cy3) on one array
or by comparing signals of genes represented on two arrays (array 1/array 2). High-density oligonucleotide arrays:
For the array preparation sequences of 16–20 short oligonucleotides (typically 25mers) are chosen from the mRNA
reference sequence of each gene. Light-directed, in situ oligonucleotide synthesis is used to generate high-density
probe arrays, usually containing over 300,000 individual elements. Target preparation: polyA+ RNA is prepared from
different tissues and used to generate double-stranded cDNA carrying a transcriptional start site for T7 DNA
polymerase. During in vitro transcription, biotin-labeled nucleotides are incorporated into synthesized cRNA
molecules. Each cRNA sample hybridizes separately to the array. Target binding is detected by staining with a
fluorescent dye coupled to streptavidin. Signal intensities on different arrays are used to calculate relative mRNA
abundance for genes represented on the array.
other advantages include the high sensitivity, the computational tools to analyze the vast amount of data
economy of size (miniaturization) and the use of non- and to enable comparisons between arrays and finally
toxic chemicals. the high costs of commercial microarrays.
Despite the enormous potential of the technology a Conceptually different approaches to the develop-
number of issues attenuate the power of microarrays, ment of microarray technology have resulted in the
e.g. the control for biological and environmental factors generation of two different array formats, oligonucleo-
and fluctuations, the validation of data, the need for tide and cDNA (‘targets’) arrays (Fig. 1). The ‘target’
652 Gene Chip Technology and Its Application to Molecular Medicine
cDNAs (or oligonucleotides) are immobilized on nylon provided by companies, like Clontech, Agilent, Incyte
membranes or glass slides. cDNA arrays are generated Genomics, Invitrogen (Research Genetics) or non-
by arraying PCR products of ▶cDNA libraries or clone profit, public, limited institutions like the German
collections usually onto glass or nylon substrates. Resource Center (▶www.rzpd.de).
cDNA arrays offer flexibility in the choice of arrayed For oligonucleotide arrays, short oligonucleotides from
elements and lower costs, particularly for the prepara- 20–25 mers (Affymetrix) up to 60mers (Agilent
tion of smaller, customized arrays for specific inves- Technologies) are usually synthesized in situ, either
tigations of a small number of genes. In addition, by photolithography onto silicon wafers (high-density
arraying of unsequenced clones from cDNA libraries oligonucleotide arrays from Affymetrix) or spotted by
can be useful for gene discovery. ink-jet technology (e.g. Agilent Technologies). Alter-
The advantage of the in situ synthesized, high-density natively, presynthesized oligonucleotides can be
oligonucleotide arrays (Affymetrix, www.affymetrix. spotted onto glass slides or glass-like matrices.
com) is the high reproducibility of in situ synthesis on Oligonucleotide arrays have certain advantages over
oligonucleotide chips, allowing an accurate compar- cDNA arrays, namely high reproducibility and the facts
ison of signals generated by samples hybridized to that the sequence information (usually EST sequences)
separate arrays. alone is sufficient to generate the DNA to be arrayed.
Furthermore the oligonucleotide arrays can be designed
Microarray Fabrication to allow both ▶SNP and alternative splicing analysis
The concept of being able to characterize large numbers and do not require amplification and purification of
of clones by ▶hybridization analyses of high-density cDNA fragments (5).
arrayed cDNA libraries was established more than a Since short oligonucleotides may result in less specific
decade ago (1). hybridization and reduced sensitivity, the arraying of
In general, arrays are described as macroarrays or presynthesized longer oligonucleotides (50–100 mers)
microarrays, the difference being the size of the sample has recently been developed to counteract these
spots and size of the array. Macroarrays are usually disadvantages. However, the high costs of commer-
printed on nylon membranes, contain sample spot sizes cially available, in situ-synthesized oligonucleotide
of about 250 microns or larger and can easily be imaged arrays and their accessories (spotters, scanners, analysis
by existing gel and blot scanners. The sample spot sizes software) as well as the time-consuming design of
in microarrays are typically less than 200 microns in oligonucleotide sets may limit their use for academic
diameter and probes are attached onto glass-like laboratories.
substrates. Depending on the arrayed material, the most Affymetrix has pioneered the oligonucleotide array
commonly used array platforms are cDNA arrays and technology and generated a number of different
oligonucleotide arrays (▶http://www.gene-chips.com/). commercially available arrays for various organisms
For the generation of a cDNA microarray (4), cDNA (▶www.affymetrix.com).
clones representing as many unique transcripts as
possible are either selected within ▶EST data (▶http:// The Microarray Experiment
www.ncbi.nlm.nih.gov/ UniGene and ▶http://www. Careful experimental design of the microarray will
tigr.org/tdb/tgi/hgi) or tissue-specific cDNA libraries ensure the maximal potential gain in efficiency and is
are constructed or ordered (www.rzpd.de). The cDNA particularly important if the resulting experiment is to
clone inserts are PCR amplified from plasmid DNA or be maximally informative, given the effort and the
amplified from bacterial cultures. In high-throughput resources.
applications the amplification of clones in cultures There are many protocols and different types of
stored in 384-well plates is more cost efficient and less systems available; the basic procedure for a large-scale
labor intensive than amplification from plasmid DNA. measurement of gene expression involves the prepara-
PCR products can be purified to remove unincorpo- tion of total or mRNA from the biological sample(s)
rated nucleotides and primers, e.g. by filtration using under investigation (e.g. ‘candidate’ organ) and the
silica systems. Amplified PCR products are spotted, hybridization of copied ‘labeled’ RNA to the array
usually in denaturing or high-salt buffer, onto glass (Fig. 1).
slides or nylon membranes using robotic systems. In most cases, the extracted mRNA is converted to
Spots are typically 100–300 μm in size and are spaced cDNA (▶reverse transcription), labeled and hybridized
about the same distance apart. Using this technique, to the DNA elements on the array surface of the array.
arrays consisting of more than 30,000 cDNAs can be In some cases (e.g. hybridization of Affymetrix chips)
fitted onto the surface of a conventional microscope the cDNA is labeled during in vitro transcription and
slide. ▶cRNA is hybridized. To ensure a high reproducibility,
Commercially available cDNA arrays consisting of fluctuations in sample preparation and hybridization
thousands of distinct sequence-verified genes are need to be reduced to a minimum. Major sources of
Gene Chip Technology and Its Application to Molecular Medicine 653
random fluctuations to be expected are in probe, target cluster analysis uses a standard statistical algorithm to
and array preparation, e.g. in mRNA preparation, arrange and organize genes according similar patterns
reverse transcription, labeling, target volume, hybridi- of gene expression (10).
zation parameters, overshining effects, non-specific The data management is of particular importance for
background, variations in pin geometry during spotting further downstream analyses. Databases are an im-
of cDNA, slide inhomogeneities and image analysis portant resource for storing and retrieving the vast
(6). Replicates of each experiment should be used in amount of data generated in a microarray experiment.
order to reduce variability and to differentiate between A number of gene expression databases have been
experimental variation and real expression differences. generated and are accessible to the public (e.g. Gene
Suitable internal controls ensure quality control Expression Omnibus at the National Center for
measurements for samples and array. After the Biotechnology Information, www.ncbi.nlm.nih.gov/
hybridization process, intensity signals from the geo and ArrayExpress at the European Bioinformatics
hybridized RNA samples are detected by phospho- Institute, EMBL-EBI www.ebi.ac.uk/ arrayexpress/).
imaging or fluorescence scanning and independent Due to the experimental variation inherent in micro-
images are generated. array experiments, the validation of the results by
alternative techniques, such as quantitative real-time G
Analysis and Data Management PCR or Northern blotting is advisable.
Microarray experiments generate large and complex The most challenging part is “making sense” of the
data sets, e.g. lists of spot intensities and intensity complex data retrieved, including to distinguish, whether
ratios. Basically, the data obtained from microarray the gene’s expression change is part of the etiology of the
experiments provide information on the relative disease or part of the pathology of the disease. The first
expression of genes corresponding to the mRNA task is the identification of the relevant gene(s), the
sample of interest. Computational and statistical tools prediction of protein function and the identification of the
are required to analyze the large amount of data in order genetic variation that modulates gene expression,
to address biological questions. including the number of loci involved, the effect of each
Once images of hybridized microarrays are processed, single locus and the interaction between loci.
arrayed spots are identified, relative signal intensities
for each spot are measured and background intensity is Clinical Applications
subtracted. Signal intensities are usually normalized to Microarrays certainly have multiple applications many
compensate for experimental variability and to ‘bal- of which will develop and evolve over time. Although
ance’ the signals from the two samples being compared the first application of microarrays was in monitoring
(7). All normalization techniques assume that all or a gene expression, the strategy of using arrayed biomo-
subset of spots (e.g. genes) on the array have an lecules to examine a biological sample is generally
average expression ratio equal to one. The normal- applicable, e.g. for mutation screening, ▶polymorph-
ization factor is then used to adjust the data (signal ism analysis, mapping and other applications. An
intensities) from the two samples and to ensure that that increasing number of human diseases result from
the total quantity of RNA hybridized to the array is the alterations in DNA sequence and/or altered gene
same. Finally mean spot or transcript intensities are expression patterns. Therefore, information about up-
calculated and ratios of intensities are used to account and down-regulation of multiple genes is important for
for relative expression differences. In a simple pairwise identification of disease genes, understanding of gene
comparison of gene expression between two samples, function and for potential therapeutic and/or diagnostic
the results can be shown in plots of the intensities or the applications. The first clinical application may be the
log of the intensity ratios. Scatter plots are widely used use of microarrays for the molecular classification of
to make the observed differential expression visible. cancer (see disease diagnosis).
A variety of software tools utilizing different mathe- Although gene expression analysis is a powerful
matical algorithms to perform microarray image approach to identify characteristics of disease states
analysis are available; a detailed discussion of the or signaling pathways, it should be noted that gene
analytical tools is beyond the scope of this article (see expression levels often represent complex, quantitative
reviews 8, 9). phenotypes, influenced by environmental and genetic
The first information obtained after data analysis and factors and the regulation of mRNA levels is only one
extraction of gene expression analysis is identification aspect of biological control. Protein levels are also
of those genes with significant differential expression controlled at several post-transcriptional steps and
in two samples or in a time series after a given protein activity is controlled by post-translational
treatment. To address the full potential of genome-scale modification. A complete picture may be obtained by
experiments a ▶cluster analysis is performed to studying the global level of cellular proteins by
analyze the entire repertoire of transcripts. Basically, proteomics (e.g. protein microarrays).
654 Gene Chip Technology and Its Application to Molecular Medicine
although other factors might also be important. For many genes, however, it is likely that duplication has
Repeated tandem duplication without purging of little or no effect on fitness and is effectively neutral. A
duplicated genes results in ▶gene clusters, that is a common way to view the fate of duplicated genes in this
group of related genes situated next to each other in the context was as a race between the accumulation of
genome. Such gene clusters can be evolutionarily mutations that eventually silence one gene (turning it
ancient (for instance the ▶Hox gene cluster), but are into a ▶pseudogene which would accumulate more
more commonly of relatively recent origin, for example mutations over evolutionary time and eventually
the cluster of zinc finger genes on chromosome 19q12. become unrecognisable) and the acquisition of diver-
Genes can also duplicate in other ways. Single genes gent functions by the two duplicates and hence the
can be duplicated elsewhere in the genome by several necessity for an organism to maintain both. Empirical
potential mechanisms, including via piggybacking on data does not support this model, however, suggesting a
adjacent active retrotransposons or insertion of a high rate of retention of duplicate genes (3).
reverse transcribed mRNA into the genome by a viral More recently, a modified form of this model has been
reverse transcriptase. Several mechanisms allow multi- proposed. Many genes, particularly in multicellular
ple genes to be duplicated simultaneously. Sections of organisms, are multifunctional, either at the biochem-
DNA containing multiple genes can duplicate either in ical level or in terms of regulated expression in different
tandem or elsewhere in the genome (for example by organs or cell types. This creates the possibility of
transposition). Evidence suggests that recently dupli- duplicated genes diverging by subfunctionalisation,
cated segments form an estimated 5% of the human which implies that duplicate genes diverge by main-
genome, suggesting this is a frequent and ongoing taining separate functions that were all initially
process (1). These are also known as segmental or maintained by the single gene ancestor. An organism
block duplications. Duplication of an entire chromo- must maintain both copies in its genome or lose one of
some with all its incumbent genes can also occur. the functions, providing a selective reason for gene
Although this is strongly selected against in humans duplicates to be retained in a population. Genes whose
and typically unviable (an exception is trisomy of expression in multiple tissues is regulated by separate
chromosome 21), some other lineages appear to be able ▶enhancers may be particularly prone to subfunctio-
to tolerate such duplications more easily and aberrant nalisation, as mutations in different enhancers of two
chromosome numbers are especially common in duplicate genes would necessitate the retention of both,
flowering plants. Finally, there is evidence that whole even if the encoded amino acid sequences were
genome duplication has occurred in several lineages, identical. This provides a potential explanation for the
including in yeast, plants and salmonid fish (2). There apparently high retention rate of duplicate genes, at
is also evidence that genome duplication occurred early least in multicellular organisms (4).
in vertebrate evolution and therefore that all vertebrates A corollary of duplicate gene divergence is redundancy
including humans are ancestrally polyploid. Duplica- and compensation. Mutational analysis of families of
tion of an entire genome includes duplication of all its homologous genes often reveals a degree of redun-
incumbent genes. dancy, such that the phenotype of a ▶double mutant is
more severe than the sum of the phenotypes of each
The Fate of Duplicated Genes single mutant. This implies the genes are partially
When examining the genomes of living organisms, we redundant and that one gene can in part compensate for
infer the presence of previous gene duplications by the the lack of the other, due to co-expression in the same
existence of genes with homologous sequences. By their tissue and a similar biochemical function.
nature, these represent gene duplications that have
become fixed within a population such that all members Evolutionary Implications of Gene Duplication
of a species (or higher taxonomic division) possess both Gene duplication has the potential to provide a lineage
copies. The route from the initial gene duplication to with ‘new’ genetic material, in that one copy of the
▶fixation or loss is affected by several factors. Some gene is, in principle, free to evolve new functions,
gene duplications confer an obvious ▶selective advan- while the other maintains existing functions. This
tage and spread rapidly through a population, for concept has led to the suggestion that gene duplication
example the tandem amplification of esterase genes in and entire genome duplication may play an important
mosquitoes can confer a degree of resistance to specific role in evolutionary innovation. However direct
pesticides. Conversely, fixation of a duplicated gene experimental support for this suggestion is lacking
within a population is by no means certain. Many gene and some evidence suggests that, contrary to the above,
products are required at a specific dosage and duplica- both duplicate copies experience purifying ▶selection
tion disrupts this balance. This is likely to reduce the following duplication (5). Nevertheless, it is common-
fitness of the individual carrying the gene duplication, place to find gene family members that have undoubt-
leading to it being rapidly purged from the population. edly evolved by duplication playing different roles in
Gene Expression Data Analysis: Supervised Analysis 657
the development, biochemistry or physiology of an structures present and operating in the cell. Expressed
organism. This implies that some degree of evolu- genes include those that are transcribed into mRNA and
tionary innovation frequently follows gene duplica- then translated into protein, and those that are
tion. A second possible evolutionary consequence transcribed into RNA but not translated into protein
of gene duplication is the disruption of successful (e.g. transfer and ribosomal RNAs).
interbreeding between populations in which different ▶Microarrays in Colorectal Cancer
genes have duplicated. This suggests gene duplication
might be a powerful force behind the evolution of
reproductively isolated populations, and therefore of
new species (6).
Clinical Relevance
The phenotypic effect of chromosomal aberrations Gene Expression Data Analysis:
involving duplication of segments of DNA or the
possession of extra chromosomes is due to the Classification
imbalance of gene products deriving from the dupli- G
cated genes. ▶Down syndrome (trisomy of chromo-
some 21) is probably the best-known case involving an Definition
entire chromosome, as, unlike most trisomies in For classification, algorithms such as weighted-voting,
humans, embryos carrying an extra chromosome 21 k-nearest-neighbor classifiers, support vector machines
are viable. There are numerous other syndromes and artificial neural networks can be applied to the set
involving the duplication of specific chromosomal of genes selected using supervised analysis of gene
regions (7). Incomplete gene duplications may also expression to build models capable of predicting the
result in the fusion of two genes at the boundary between class of a particular sample. To test the robustness of
the original and duplicated DNA. This can result in classification, these methods are often coupled with a
novel gene products with deleterious properties. leave-one-out cross-validation analysis, in which one
of the samples from the original ‘training’ set is
withheld and a class prediction is made on the withheld
References sample. For complete validation, gene lists should be
1. Eichler EE (2001) Recent duplication, domain accretion tested on a second ‘test’ set of samples that were not
and the dynamic mutation of the human genome. Trends used to derive the discriminatory gene list.
Genet 17:661–669 ▶DNA Chips
2. Skrabanek L, Wolfe KH (1998) Eukaryotic genome ▶Gene Chip Technology and Its Application in
duplication – where’s the evidence? Curr Opin Genet Molecular Medicine
Dev 8:694–700
3. Hughes MK, Hughes AL (1993) Evolution of duplicate ▶Microarrays in Colorectal Cancer
genes in a tetraploid animal, Xenopus laevis. Mol Biol
Evol 10:1360–1369
4. Lynch M, Force A (2000) The probability of duplicate
gene preservation by subfunctionalisation. Genetics
154:459–473
5. Wagner A (2002) Selection and gene duplication: a view
from the genome. Genome Biol 3:1012.1–1012.3
6. Lynch M, Conery JS (2000) The evolutionary fate and Gene Expression Data Analysis:
consequences of duplicate genes. Science 290:1151–1155
7. Inoue K, Lupski JR (2002) Molecular mechanisms Supervised Analysis
for genomic disorders. Annu Rev Genom Hum Genet
3:199–242
Definition
Using this method, one searches for genes whose
expression patterns correlate with an external para-
meter. The most commonly used ‘supervising’ para-
Gene Expression meters are clinical features such as survival, presence of
metastases and response to therapy. Many statistical
metrics have been used successfully in ‘supervised’
Definition analyses, including the standard t-test, permutation-
Gene expression describes the process by which a based tests, and signal-to-noise ratios.
gene’s coded information is converted into the ▶Microarrays in Colorectal Cancer
658 Gene Expression Data Analysis: Unsupervised Analysis
Definition
Gene expression data matrix refers to a table where
each row represents a gene, each column represents a Gene Silencing
particular sample, or a particular experimental condi-
tion, and each position contains a number or a set of
numbers characterising the expression level of the Definition
particular gene under the particular experimental Gene silencing refers to repression of genes by
condition. the formation of a specialized chromatin structure
▶Microarray Data Analysis (heterochromatin). Silenced genomic regions carry
specific histone modification patterns (hypoacetyla-
tion, H3 K9 methylation in some organisms) and are
bound by heterochromatic proteins.
▶Chromatin Acetylation
Gene Gun
Gene-Gene Interaction
G
Definition
Gene Therapy Gene-gene interaction describes an interplay between
variations in two different genes that influence(s)
susceptibility.
▶Clinical Gene Transfer ▶Atopy Genetics
Definition
Genetic algorithm is a statistical/mathematical term/
method that is aimed at finding the optimal solution to
a question. The process is the same as natural selection:
Gene-Based Therapies (1) selection of the strongest individuals/solutions,
(2) production of new organisms/new solutions,
by mixing the previously selected elements, and
Definition (3) mutations, which are accidental changes in the
Gene-based therapies involve the transfer of genetic organisms/solutions. The process is reiterated until no
material into a host with the hope of ameliorating or more improvement can be done. The last solution is
curing a disease. taken as the final one.
▶Limb Girdle Muscular Dystrophies ▶EST Mining for Expression Analysis
660 Genetic Background
several confounding factors. Instead of reiterating his the genetic background will not allow one to conclude
arguments, the reader’s attention is now drawn to what with certainty that a particular phenotypical change
is proposed to be the main issue, the necessity of a observed in the null mutant was due to the null
“systemic approach” view in gene targeting (5) (also mutation or to the genetic background. This issue is
see ▶Phenomics). especially problematic if the genetic background of the
null mutant animals is different from that of their wild
The Need for a Systemic View type control counterparts, which is a typical problem in
The principal problem in genomics is a systemic one a large number of gene targeting studies.
that concerns biological organization and the functional
units of this organization. From a geneticist’s viewpoint Null Mutant Mice of Gene Targeting Studies
the units of biological organization are the genes and Are Often the F2 Offspring of Two Mouse Strains
their function is to encode particular proteins. How- The genetics of the breeding strategy usually employed
ever, when it comes to the question of phenotypical to generate null mutant mice is explained in Fig. 1. The
effects, genes may not be the units and the definition figure also depicts why the hybrid genetic background
of their function may be complicated. Clusters of genes leads to complications.
in the genetic background defined by higher organiza- G
tional level phenomena, including developmental, The Flanking Allele Problem
physiological or even behavioral, may represent the The F2 population is a segregating population in which
functionally relevant unit. Disruption of a single gene mice have recombinant genotypes derived from the two
may alter a biochemical cascade within the functional parental mouse strains (Fig. 1). The difficulties arising
gene cluster. Expression levels of the genes belonging from this are threefold. First, the recombination pattern,
to a functional cluster may change in concert. i.e. which locus contains strain 129 and which B6
Investigation of such changes may reveal the biological alleles and whether in a homozygous or heterozygous
organization of the organism. The boundaries of form, may be different between littermates. Thus not
putative gene clusters may not be sharp. Some genes even wild type littermates of their mutant counterparts
may belong more, others less, to a specific functional represent an appropriate control, since their alleles
gene group. This also implies that the gene group could be different from those of the mutants not only at
organization may not be orthogonal, i.e. some genes the locus of the gene of interest but also at other loci.
may belong to more than one functional group. This may lead to false positive results. Second, due to
Functional groups may be hierarchically organized. A the genetic variation resulting from the hybrid
smaller number of genes may define subgroups that segregating background, detecting significant effects
may make up groups that in turn may be organized into of the mutated gene may be difficult, which leads to
super-groups, etc. Disrupting single genes will perturb false negative results. These two problems can be
the organism and will force it to respond in a way alleviated by measuring larger numbers of animals, i.e.
inherent in its biological organization. The phenotypi- by increasing the power of statistical comparisons and
cal changes one observes are the reflection of this decreasing the possibility of sampling error. Increasing
organization. Instead of looking for the function of the sample size, however, will not solve the third
single genes, it is proposed that investigators should problem, which is associated with genetic linkage. If
take a systemic organizational view into consideration, the targeted mutagenesis is conducted in ES cells from
an approach conceptually similar to metabolic control strain 129, the chromosome with the targeted locus will
analysis employed in biochemistry. carry alleles of genes of 129-type. The probability of
In summary, the effects of genetic manipulation must genetic recombination is generally inversely related to
be investigated at all practically feasible levels of the distance between the loci of the genes (linkage).
biological organization, including gene expression Thus, the 129-type alleles of the genes whose loci are
patterns, protein-protein interactions and a broad close to the locus of the mutated gene will remain
spectrum of phenotypical traits that may be affected together with the mutated allele of the gene of interest
by the direct and indirect effects of the mutation. (Fig. 1). In other words, any time the mutation is
detected in a mouse, e.g. by ▶Southern blotting or
Polymorphism in the Genetic Background May PCR, that particular animal will also carry the linked
Make the Results of Gene Targeting Studies 129-type genes with high probability. Conversely, a
Difficult to Interpret non-mutant control, animal will not carry these
Assume that knock out of gene α leads to differential 129-type alleles and will have B6 alleles with high
expression of alleles b vs. B of gene β, and a regulatory probability if the 129-ES cell chimera was crossed to
change of gene β leads to different phenotypical effects B6. In effect, the mutation can be seen as a ▶marker for
depending on which allele (b or B) is present in the α the 129-type genes linked to the locus of the targeted
null mutant mouse. Consequently, ▶polymorphism in gene. Consequently, any phenotypical differences
662 Genetic Background
Characteristics
References Main Molecular Processes Involved in Decoding
1. Chen C, Kano M, Abeliovich A et al (1995) Impaired The coding rules are the outcome of extremely
motor coordination correlates with persistent multiple sophisticated molecular processes. Basically, the transla-
climbing fiber innervation in PKC gamma mutant mice. tion system converts genetic information contained in
Cell 83:1233–1242
genes into functional proteins. The information is always
2. Crusio WE (1996) Gene-targeting studies: new methods,
old problems. Trends Neurosc 19:186–187 processed in the form of an mRNA. In Eukaryotes, the
3. Gerlai R (2001) Gene targeting: Technical confounds and latter can be the outcome of a complex maturation process
potential solutions for the behavioral neuroscientist. from its primary transcript(s), which can involve several
Behav Brain Res 125:13–21 genes. In Eubacteria and Archebacteria, it is generally
4. Gerlai R (1996) Gene targeting studies of mammalian translated during or immediately after its ▶transcription
behavior: Is it the mutation or the background genotype? from a gene without any rearrangement. It may also come
Trends Neurosc 19:177–181
5. Gerlai R (1996) Gene targeting in Neuroscience: The from the genome of an invasive entity (virus).
systemic approach. Trends Neurosci 19:188–189 A start codon upstream of the mRNA signals the place
6. Lathe R (1996) Mice, gene targeting and behaviour: more of the beginning of translation. This codon is almost
than just genetic background. Trends Neurosci 19:183–186 always AUG, but is also sometimes GUG (very rarely
664 Genetic Code
Genetic Code. Figure 2 The main molecular processes involved in the assemblage of free amino acids into proteins
according to the rules of the genetic code. Three sets of molecules participate in the translation of the mRNAs: the
encoded amino acids (20 different kinds), the aminoacyl-tRNA synthetases (aaRSs) (at least one for each encoded
amino acid, not all shown here) and the tRNAs (at least 22 but often more than 30 different kinds, not all shown here).
The example shows the events leading to the translation of a CUU codon of an mRNA. First, a leuRS binds a leucine
and subsequently catalyses its activation through the hydrolysis of a molecule of ATP, resulting in an aminoacyl-
adenylate (leu AMP) and ppi [reaction (1)]. As soon as the leuRS recognizes a cognate tRNA (in this case, a tRNA
with an anticodon GAG), the aminoacylation reaction occurs and the tRNA is thus charged with the amino acid [reaction
(2)]. The AMP molecule is then released. After the [EF-Tu + GTP] complex has bound the amino acid at the 3′ end of
the tRNA (not shown here), the latter can participate in translation, occurring on a ribosome. In accordance with the
wobble rules (Table 1), it can successfully bind the CUU codon, thereby enabling the formation of the peptide bond
between the incoming leucine and the nascent protein [reaction (3)]. The previous tRNA (with the anticodon AAG) then
leaves the ribosome. Note that the codons read from 5′ to 3′ (thus from right to left in the figure) and the anticodons from
3′ to 5′. The arrow above the mRNA shows the starting place of translation, while the cutting of the strand into codons is
indicated by small dots. For clarity, details on structures and events are omitted at the level of the ribosome.
codon at the point of being translated. At this stage, place. Before that event however, almost all remaining
most of the non-cognate tRNAs are rejected. However, near-cognate tRNAs are rejected (proofreading) (5).
in addition to cognate tRNAs, some near-cognate
tRNAs (whose anticodons can form partial base-pairs) Caveat
may also cross this first stage, associated with the The standard rules of the genetic code can be altered
hydrolysis of the GTP. This hydrolysis results in a during translation events that have been referred to as
▶conformational change of EF-Tu, which subse- “recoding” (1). These alterations are specific to
quently leaves the 3′ end of the tRNA. The amino acid individual mRNAs, and involve three possible kinds
is then free to move to the ▶peptidyl-transferase center of event, ▶programmed frame shifting, ▶translational
of the ribosome where peptide bond formation can take bypassing and ▶redefinition of codons. Albeit not
666 Genetic Code
have been discovered in different taxa. In addition, the mitochondrial systems that also have UGA coding for
organelles (mitochondria and chloroplasts) have their trp. These changes imply that the wobble rules can be
own translation systems with specific variants of the altered within the codon families concerned.
canonical genetic code (Fig. 4). Furthermore, it has been observed that all codons that
An examination of all the variants discovered so far have been reassigned in the nuclear systems have also
indicates that some codons are more prone to been subject to reassignment in the mitochondria, while
reassignment than others (6). The stop codons no codon has been reassigned in the nuclear systems
constitute a typical example; the two-fold degenerate only (6).
family UAR is frequently reassigned to gln in the Some codon families seem to be excluded from any
nuclear system, while UGA almost always codes for trp reassignment procedure. This is especially the case for
in mitochondria. This particular versatility may be all codons with G in first position, corresponding to the
explained by the rarity of these codons, which implies amino acids val, ala, gly, asp and glu, as compared with
that their reassignments might not cause catastrophic the codons with U, C or A in this same position, for
disturbance. Another interesting change concerns the which at least four changes have been reported (Fig. 4).
AUA codon, which often codes for met in the Moreover, all these amino acids are only present in this
mitochondria. As a result, the degeneracy symmetry part of the table in the canonical genetic code. This
pointed out in Fig. 3 is entirely valid for the particularity shows that the genetic code has a core.
668 Genetic Code
Autoantibodies to five aaRRs have been described, and alternatives for dealing with the risk of occurrence; (4)
each is associated with a syndrome of inflammatory choose the course of action which seems to them
myopathy with interstitial lung disease and arthritis (8). appropriate in view of their risk their family goals and
At the level of the ribosome, antibiotics such as their ethical and religious standards to act in accordance
paromomycin and streptomycin can significantly affect with that decision; and (5) to make the best possible
the fidelity of translation. Paromomycin stabilizes the adjustment to the disorder in an affected family
tRNAs irrespective of whether the codon-anticodon member and/or the risk of recurrence of that disorder.”
pair is cognate or near-cognate and hence increases ▶Fragile X Syndrome
amino acid misincorporation (5). ▶Peutz-Jeghers Syndrome
▶Nucleotide Biosynthesis
▶Translational Control in Eukaryotes
Synonyms
Linkage analysis; positional cloning; gene mapping
Genetic Epidemiology. Table 1 A list of human disease genes characterized through genetic epidemiological
research employing a positional cloning approach
Genetic Epidemiology. Table 1 A list of human disease genes characterized through genetic epidemiological
research employing a positional cloning approach (Continued)
Genetic Epidemiology. Table 1 A list of human disease genes characterized through genetic epidemiological
research employing a positional cloning approach (Continued)
Genetic Epidemiology. Table 1 A list of human disease genes characterized through genetic epidemiological
research employing a positional cloning approach (Continued)
Genetic Epidemiology. Table 1 A list of human disease genes characterized through genetic epidemiological
research employing a positional cloning approach (Continued)
approach taken to detect the genetic variants respon- therefore has to be transformed into a linear “▶genetic
sible for these diseases is generally known as distance” d(θ) by some ▶mapping function (the most
“positional cloning”. Instead of relying upon prior widely used being d(θ) = ½ln(1–2θ), proposed by
knowledge about the disease-causing biochemical British geneticist J.B.S. Haldane in 1919). Genetic
defect(s), positional cloning utilizes the segregation distance is measured in units of Morgan (M) in order
pattern of genetic markers (e.g. SNPs, microsatellites, to honour T.H. Morgan, the American Nobel
RFLPs) in affected families to localize the genes prize-winning biologist who first discovered the role
involved in a given phenotypic trait (“linkage analy- of chromosomes in heredity. One centi-Morgan (cM)
sis”). The more often the trait and a particular marker roughly corresponds to one expected recombination per
allele are co-inherited by family members, the stronger 100 meioses.
the evidence that a gene in the vicinity of the marker
influences the trait, i.e. that marker and disease gene are
linked. Parametric Linkage Analysis
Linkage analysis has two major objectives, namely
Characteristics (i) to clarify with some statistical confidence whether
Gene Mapping and Meiotic ▶Recombination θ = ½ or θ < ½, and (ii) to estimate θ in the latter case.
Formally, linkage analysis involves the assessment of Both goals are easily achieved in laboratory animals
the recombination fraction θ between two genetic loci where controlled breeding can be performed such that,
like, in the context of genetic epidemiology, a marker after a few generations, recombinants and non-
and an unknown disease gene. Parameter θ equals the recombinants can simply be counted. In humans as
rate at which children receive from a given parent either well as in animals with longer generation times
the grand-maternal allele at one locus and the grand- however, linkage analysis has to fall back upon family
paternal allele at the other or vice versa. Assuming data. How such data can be used to draw statistical
Mendelian inheritance, θ = ½ for a pair of genes located inferences about linkage depends upon their complex-
on different chromosomes. For two loci residing on the ity, i.e. on how much prior knowledge is available
same chromosome, meiotic recombination is only about the genetic, environmental and stochastic nature
possible via “▶crossing-over”, signifying the breakage of the phenotypes of interest.
and re-union of homologous, non-sister chromatids For the simple genotype-phenotype relationships
during the metaphase of meiotic division I (Fig. 1). encountered with most monogenic disorders, family
Indeed, it can be shown mathematically that θ equals data can be analysed by explicitly modelling the co-
exactly half the probability of at least one crossing-over inheritance of the disease and marker in a family, based
occurring between the two loci in question, provided upon the underlying genotype frequencies and ▶pene-
that some critical assumptions about the randomness of trances (“parametric linkage analysis”). In such cases,
crossing-overs are correct. the likelihood L of a given recombination fraction θ0
In any case, one corollary of the above is that θ between disease gene and marker is a computable
represents an increasing function of the physical function of the phenotypic data D observed in the
distance between two loci and therefore provides a family. This leads to the definition of zðy0 Þ ¼ log10
key parameter for gene mapping. Unfortunately, θ is fLðy ¼ y0 jDÞ=Lðy ¼ 1=2jDÞg.
not an additive measure of distance since it can never Quantity z, termed the “▶lod score” and introduced by
exceed ½. In order to facilitate gene mapping, θ N.E. Morton in 1955, is used as a sequential statistic to
Genetic Epidemiology 675
Genetic Epidemiology. Figure 1 Process of crossing-over during germ cell development. The two nearly
duplicated chromosomes align during the late metaphase of meiotic division I, where an overlap and breakage
of their constituent non-sister chromatids may occur (red: maternal chromosome, blue: paternal chromosome).
Re-annealing and re-synthesis by the cellular repair mechanisms leaves two chromatids with genetic material
flanking the site of crossing-over that is not of the same parental origin (i.e. the resulting chromosomes would
represent recombinants with respect to the two loci shown).
test whether θ < ½, and to quantify the evidence in . relatively minor effects exerted by individual
favour or against θ0. When z(θ0) > 3, then linkage variants and
between disease gene and marker locus is regarded as . an important modifying role of environmental
being proven and θ is estimated by that recombination factors.
fraction that yields the highest lod score. This
Since no prior information is usually available as to
procedure is exemplified in Fig. 2 for a large family
how genetic variation at a given locus modifies the risk
affected by an autosomal dominant disorder. The
for a complex disease (i.e. the genetic model of the
results of a linkage analysis are usually presented in
disease is unknown), gene mapping for complex
the form of lod score tables or graphs (Fig. 2b) where
diseases has to adopt robust, albeit less powerful,
ceteris paribus, studies of independent (i.e. unrelated)
“non-parametric” or “model-free” linkage analysis
families can be aggregated by summation of the family-
such as, for example, the study of pairs of relatives.
wise lod scores. If a disease gene is to be integrated into
The idea underlying this approach, which is both
a pre-existing map of linked markers, then this is most
simple and intuitively appealing, goes back to a 1935
efficiently performed by parametric multi-locus linkage
paper by British medical geneticist L.S. Penrose. The
analysis, which has been shown to be up to twice as
number of sib-pairs, out of a total of n independent
accurate as the pair-wise approach, measured in terms
pairs, who share k parental alleles of an autosomal
of the variance of the ensuing recombination fraction
locus identical by descent (“ibd”) follows a multi-
estimates.
nomial distribution with parameters zk, k = 0, 1, or 2
(Fig. 3). Under the null hypothesis of no etiological
Non-Parametric Linkage Analysis of Complex Diseases
connection between marker and disease, the inheritance
Increasingly, the major challenge to genetic epidemiol-
of a marker can be assumed to follow Mendelian rules
ogy is being posed by so-called “complex” diseases,
and to be independent of the disease status of the
which comprise conditions such as diabetes, heart
siblings. This implies that, irrespective of whether the
disease, cancer and psychiatric illness. When compared
sibs are concordant or discordant, z0 = ¼, z1 = ½ and
to monogenic disorders, this category of disease is
z2 = ¼. Any test for a deviation from these proportions
characterized by
represents a test for linkage between the marker
. a substantially higher population frequency, and a putative disease gene. In its original form, the
. the involvement of multiple genes, most probably affected sib-pair test required that the ibd status at each
interacting with one another, marker be determined unequivocally for all sib-pairs.
676 Genetic Epidemiology
Genetic Epidemiology. Figure 2 Linkage analysis of a family affected by an autosomal dominant disease.
(a) shows the pedigree with patients marked by black symbols. Genotypes observed for a biallelic marker locus
with alleles “A” and “B” are displayed alongside each individual. (b) is a graphical display of the lod scores as
calculated from the family data, assuming full penetrance and lack of de novo mutation. The lod score at θ0=0.00
equals minus infinity owing to a recombination that has occurred during the paternal meiosis leading to an
affected girl in the most recent generation (marked by *).
Genetic Epidemiology. Figure 3 Level k of autosomal identical-by-descent (ibd) allele sharing between two sibs.
For each value of k (i.e. k=0,1, or 2), shared alleles are marked in red for the second sib.
Genetic Epidemiology. Table 2 Haplotype linked to a gene of strong effect, the phenotypic G
frequencies of two biallelic loci covariance among relatives should be positively related
to their degree of marker ibd sharing, and this
Marker 2 Marker 1 Total relationship translates into larger estimates of the
allele 1A allele 1B corresponding variance components.
allele 2A a b a+b
▶Linkage Disequilibrium
allele 2B c d c+d Intervals of 1 Mb are generally regarded as the limit of
Total a+c b+d 1 mapping resolution that can be achieved using (family-
based) linkage analysis, and more precise mapping of
Linkage disequlibrium (LD) is usually measured by D=ad-bc.
disease genes can only be expected from population-
However, since D depends upon the marginal sums (i.e. the based association studies, exploiting linkage disequili-
allele frequencies), LD is often quantified by D'=D/Dmax instead. brium (LD). The most general definition of LD is the
Here, Dmax denotes the maximum absolute value of D that is condition that the alleles of linked loci do not occur
possible for the same allele frequencies. If D>0, then Dmax=min statistically independent on the chromosomes observed
{(a+c)(c+d),(a+b)(b+d)}; if D<0, then Dmax=min{(a+b)(a+c), in a population. In the simplest case of two biallelic
(b+d)(c+d)}. loci, haplotype frequencies can be arranged in a
two-by-two table with cell probabilities a,b,c and d
(Table 2), and LD is meaningfully quantified by the
that the observed level of marker ibd sharing between cross product D = ad − bc. When a new allele first
relatives is compared to their phenotypic similarity. If a arises in a population by either mutation or migration, it
marker is linked to a gene that influences the trait, then occurs as a single copy that resides on a certain
sibs with similar phenotypes, for example, will tend to haplotype background, together with certain alleles of
share more than ½ of their marker alleles ibd whereas other loci. Only in later generations will the allele
dissimilar sibs will not. A formal test for this effect was become more frequent through either selection or
proposed by American statisticians J.K. Haseman and genetic drift or both. In any case, chromosomes
R.C. Elston in 1972. For each sib-pair, the squared carrying the new allele will recombine with chromo-
difference Y of their phenotypic values is calculated, somes carrying other haplotypes so that the strong
and the number π of their ibd marker alleles determined original LD will erode with time. This loss of LD will
(or estimated). A linear regression analysis of Y on π be slower for closely linked loci and, under some
then reveals (i) whether the two variables are correlated simplifying assumptions about mutation rates, migra-
and (ii) whether a significant relationship, if found, tion and mating patterns, D can indeed be shown to
is a biologically plausible indication of linkage. The decrease by a factor of 1 − θ in each generation.
Haseman-Elston approach has since been expanded Therefore, strong LD is an indication of close linkage
and refined, for example by considering the mean- and assessment of LD between a marker and putative
corrected product of the sib-specific phenotypes disease gene can be regarded as linkage analysis in a
instead of Y, so as to increase informativity about super-pedigree tying all analysed individuals together.
linkage. An alternative method aims at decomposing In principle, any kind of genetic marker can be
the variance of the phenotype into components that are employed in disease association studies provided that
due to genes which are either linked to the marker, or (i) the marker mutation rate is low and (ii) the density of
not (“variance component analysis”). For a marker markers chosen for analysis is high enough to ensure
678 Genetic Epidemiology
Genetic Epidemiology. Figure 4 Transmission disequilibrium test (TDT) of disease-association for biallelic marker
genes. The sampling units of the TDT are nuclear families comprising both parents and an affected child (“trios”).
Only transmissions from heterozygous parents to their children are evaluated (cells x and y in the table shown). The
TDT statistic equals McNemar’s (x-y)2/(x+y), which follows a χ2 distribution with 1 degree of freedom under the null
hypothesis of no association.
sufficiently strong LD with disease gene(s). Empirical estimated from the respective frequencies in patients
data and theoretical considerations suggest that a and unrelated healthy controls. However, concerns
sensible marker density should be no lower than have arisen over the potentially confounding effects of
approximately 1 in 50,000 nucleotides. Ideally, asso- ethnic, social or geographical population stratification
ciation markers should be chosen from within genes that would generate systematic differences between the
that represent biologically plausible candidates for an genetic characteristics of the two samples, unrelated to
involvement in the disease of interest. The chance of disease. To solve this problem, family-based associa-
detecting association would then be increased further if tion designs have been proposed of which the
marker alleles were themselves of functional signifi- transmission disequilibrium test (TDT) is the most
cance by altering, for example, the protein product or a widely used. A TDT is basically a McNemar test for
regulatory sequence. preferential transmission of particular marker alleles
from heterozygous parents to their affected offspring
Family-based Association Studies (Fig. 4). Any deviation of the transmission to non-
The simplest form of a population-based association transmission ratio from the expected 1:1 is indicative
study is that invoking a case-control design. As in of both linkage between marker and disease gene
classical epidemiology, relative risks or odds ratio of in the presence of LD and of LD in the presence of
particular marker genotypes or haplotypes can be linkage.
Genetic Heterogeneity 679
Since chromosomes of close relatives act as internal is a constantly evolving scientific discipline so that
controls in the TDT and similar tests, it has become improved power may also arise from the development
almost paradigmatic for the genetic epidemiology of and application of new analytical tools that take genetic
complex disease that family-based association studies and etiological heterogeneity into account (e.g. multi-
are superior to case-control designs. However, family- locus statistics, time series analysis).
based designs also have disadvantages that might not Complex human diseases are typically common and
always be fully outweighed by their apparent robust- have a substantial economic impact upon national health
ness. For example, gene-environment interactions systems. The resulting public interest renders disorders
cannot be analysed in family-based studies since no such as cancer, heart disease and diabetes particularly
genuine controls are available for comparison to attractive for genetic research and a large number of
patients. Furthermore, parental genotypes are required studies into these diseases are often being performed in
for the TDT and these may be difficult to obtain for parallel. On the other hand, for the reasons mentioned
late-onset disorders. The use of other family members above, the prior probability of successfully mapping and
as surrogates to try to reconstruct parental genotypes characterizing genes for complex diseases is compara-
with some certainty has been suggested in such tively low and usually unknown. Even with high
instances. However, such methods are usually costly, significance levels imposed, most positive gene map- G
inaccurate and inefficient. On the other hand, possible ping results may therefore be wrong. This implies that
confounding of data by population stratification can genetic epidemiological research will almost inevitably
be avoided in case-control studies through careful continue to be driven towards the generation of false
matching by ethnic and geographic origin. Further- positive results. Claims as to the elucidation of a disease
more, if a sufficiently large set of genetic markers is predisposition should therefore always be received with
available that is not itself tested for association, then some caution and judged as preliminary until confirmed
these markers can be used to estimate the level of by controlled replication, meta-analysis or independent
population stratification and to correct the employed laboratory experiments.
test statistic accordingly. Finally, although population
stratification may represent a theoretical possibility, References
empirical evidence for its practical importance as a 1. Elston RC (1998) Linkage and association. Genet
confounding factor in genetic epidemiology is still Epidemiol 15:565–576
lacking. 2. Schork NJ, Cardon LR, Xu X (1998) The future of genetic
epidemiology. Trends Genet 14:266–272
Clinical Relevance 3. Risch NJ (2000) Searching for genetic determinants in the
The reasons for the apparent lack of success that plagues new millennium. Nature 405:847–856
4. Terwilliger JD, Göring HHH (2000) Gene mapping in the
genetic studies of complex disease are manifold and the
20th and 21st centuries: statistical methods, data analysis,
most critical issue is probably the reduction in power and experimental design. Hum Biol 72:63–132
caused by genetic heterogeneity. Genetic heterogeneity 5. Sham P (2001) Shifting paradigms in gene-mapping
can occur at two levels, either within genes (“allelic methodology for complex traits. Pharmacogenomics
heterogeneity”) or between genes (“locus heterogene- 2:195–202
ity”). In order to increase the power of genetic studies of
complex diseases, the major goal in their planning and
performance is thus to control for genetic heterogeneity
at all levels. First and foremost, this requires a careful Genetic Hearing Disorder
choice of the population under study. Ideally for a
disease association to be detectable, all copies of the
predisposing allele should be ibd in patients. The best Definition
populations to analyse for LD are therefore those that Genetic hearing disorders mainly comprise of non-X
have been small and isolated for most of their history or linked genetic hearing loss, which is only observed in
that have undergone recent expansion from a small homozygous individuals having inherited the mutated
number of founders. Not surprisingly, genetic epide- gene from each parent.
miology has been particularly successful in Finland and ▶Microvilli
some inward breeding communities in the USA (e.g.
Amish, Hutterites, Ashkenazim). In addition to popula-
tion genetic issues, an appropriate definition of
phenotypes, a breakdown by sub-phenotypes and the Genetic Heterogeneity
use of sensible covariates to define etiologically homo-
geneous sub-populations can help to reduce genetic
heterogeneity further. Finally, genetic epidemiology ▶Heterogeneity/Heterogenous
680 Genetic Immunization
Definition Definition
Genetic immunization is a technique to induce specific Genetic polymorphism is the presence of multiple
immune responses by injecting antigen-encoding inherited forms of a gene with at least an allele
expression plasmid DNA. frequency of 1% within the population.
▶DNA-based Vaccination ▶Pharmacogenomics
Genetic Interactions
Genetic Predisposition
to Multiple Sclerosis
Definition
Genetic interactions describe interactions between two
or more mutations that result in a phenotype. A LASTAIR C OMPSTON
▶Cell Polarity University of Cambridge Neurology Unit,
Cambridge, UK
alastair.compston@medschl.cam.ac.uk
▶linkage disequilibrium present in the population molecules, immune receptors, accessory molecules,
under study. In founder populations, polymorphisms cytokines, chemokines and their receptors or antagonists,
that increase susceptibility to disease are necessarily structural genes of oligodendrocytes or myelin, and
located within a large group of linked genes. This block molecules regulating cell death and survival. In a small
is subject to recombination during subsequent meioses and regionally restricted population of Finns, multiple
and is gradually whittled down until there is no residual sclerosis is associated and linked to the gene for myelin
linkage disequilibrium. It seems that genetic factors basic protein, encoded on chromosome 18. The effect can
determine susceptibility and, to some extent, also shape be traced to a subset of families with common ancestry
the clinical course. The choice of markers has been and does not hold up in the larger cohort. It is the nature of
driven either by a priori guesses on the nature of screening so many potential effects that a proportion will
susceptibility (▶candidate genes) or systematic screen- appear to be associated or linked but by chance. Equally,
ing of the genome. Candidates selected because they it would be difficult to show unambiguously that one or
map within linked regions combine both strategies. more genes exerting a small biological effect is making a
Markers are analyzed individually (single point contribution to susceptibility in a single study. That said,
analysis) or corrected for information available from some plausible associations or linkages have been
their neighbours (multipoint analysis). The cumulative provisionally reported, although no one of these has yet G
probability that a given marker or region of interest is stood up to repeated replication. These mainly involve
linked or associated with multiple sclerosis can be factors that appear to increase susceptibility to multiple
formally tested by ▶meta-analysis of available studies. sclerosis, but there is provisional evidence for primary
The lesson learned thus far is that no one gene makes a effects on resistance and effects on severity or clinical
major contribution to susceptibility although collec- features of the disease.
tively they determine a relative risk (for siblings) of DR15 is associated with younger age at diagnosis and
around 20. female gender but does not distinguish features relating
to disease course, outcome, specific clinical features or
Candidate Genes in Multiple Sclerosis paraclinical investigations. This suggests that DR15
Much effort has gone into the assessment of candidate exerts its effect on susceptibility rather than modifying
susceptibility genes chosen on the basis of prevailing the course of multiple sclerosis. Loci apparently
ideas concerning the pathogenesis of multiple sclerosis. associated with disease protection are FAS-670, IL-
Population studies comparing unrelated cases and 12p40, FcR and MCP-3. Genes that may influence the
controls show an association between the class II course or phenotype of multiple sclerosis include
▶major histocompatibility complex alleles DR15 and CTLA4, IL-1Ra/IL-1B, IL-2, CCR5, oestrogen recep-
DQ6 and their underlying genotypes (DRB1*1501, tor, CNTF and Apo-E and mutations of mitochondrial
DRB5*0101 and DQA1*0102, DQB2*0602). This is DNA.
seen in almost all populations (Caucasian, Oriental,
Arab, Hispanic, Finnish, Russian and Jewish) although Linkage Genome Screens
the strength of the association differs. Even those ethnic The dividend from attempting to fast-track the solution
groups in which the frequency of multiple sclerosis is to susceptibility in multiple sclerosis by the candidate
low, or the phenotype distinct from that usually gene approach has been small but the problem is also
observed in northern Europe, are now acknowledged not solved by the nine whole genome linkage analyses
to be primarily DR15 associated with one exception. In using variable numbers of families from the United
Sardinians, the association is with DR4 (DRB1*0405- States, Canada, United Kingdom, Finland, Sardinia,
DQA*0301-DQB1*0302). In some other Mediterranean Italy, Turkey, Scandinavia and Australia. These screens
populations (Canaries and Turkey), the association is have involved between 21–225 families each typed for
with DR2 (DBR1*1501, DQA1*0102, DQB1*0602) 257–443 microsatellite markers chosen to provide an
and DR4 (DRB1*04, DQA1*03, DQB1*0302). Most average spacing of around 10 centiMorgans. Although
investigators assume that—based on the genetics several new genomic regions of interest were revealed,
and obvious candidature through its role in restricting many are false positives. These whole genome screens
the immune response—DR (or DQ) is itself the have been used to explore regions of interest in more
susceptibility gene encoded at 6p21. detail hoping to consolidate their status based on
Outside the major histocompatibility complex, many mapping but without picking out positional candidates.
candidates have been screened. The case made on the Linkage on chromosome 17q is supported by addi-
basis of ideas concerning the pathogenesis of multiple tional positional screens from Denmark, Canada, and
sclerosis is much strengthened by prior knowledge that Finland. There is collateral support for the involvement
the candidate gene also maps to a region already of chromosomes 5p, 7p and 12q based on direct
implicated by linkage studies (positional candidates). evidence and synteny with genes determining sus-
The range of candidates now studied includes adhesion ceptibility to experimental forms of demyelination.
682 Genetic Predisposition to Multiple Sclerosis
Meta-analysis has been deployed in the expectation provisional site already offers several interesting
that this will reduce the evidence for false positive possibilities, although the number of genes encoding
peaks and strengthen the candidature of those which are components of the nervous, immune and signalling
genuine providing the best guide to shared regions of systems is such as to make practically any region
interest as the map is serially up-dated. This was last suggestive with respect to sensible candidates. Rapid
completed in 2005. progress is being made in characterising the whole
genome for the size, distribution and diversity of blocks
Whole Genome Association Screening containing a restricted number of haplotypes. If the
Until recently, whole genome linkage disequilibrium preliminary evidence holds up, it will be possible to tag
mapping was considered impractical and dependent on the common variants within each block in populations
chance co-localisation of susceptibility genes and (such as Europeans) retaining significant linkage
markers applied randomly and at low density. This disequilibrium and screen individuals for the suscept-
situation changed with the increased availability of ibility haplotypes with relative economy.
widely distributed microsatellite markers and is set to
increase further with the identification and mapping of Clinical Relevance
▶single nucleotide polymorphisms. A first pass at ▶Concordance within families can be used to gauge
screening the genome for association was completed in the influence of genetic factors in determining the
2003 based on a 0.5 cM map of microsatellite markers clinical phenotype of multiple sclerosis. Time to reach
and using DNA pools derived from cases with multiple the later stages of disability does not differ between
sclerosis and unrelated controls. Individual results familial and sporadic cases. Conjugal pairs show no
provided provisional evidence for associations based evidence for clinical concordance, clustering at year of
on linkage disequilibrium outside the major histo- onset or distortion of the expected pattern of age at
compatibility complex on 6p21. The number of onset in the second affected spouse. The most recent
micro-satellite markers used necessarily made this a assessment of concordance in co-affected siblings and
low-density screen, especially since the number of parent-child pairs supports a role for genetic factors in
informative markers was less than the full set of 6000 determining age at onset and progression either from
used in this Genetic Analysis of Multiple sclerosis in onset or after a phase of relapsing remitting disease, but
EuropeanS (GAMES). With considerable variation not the initial presentation or disability. Concordant
depending on the stochastic nature of linkage dis- parent-child pairs show no distortion in the random
equilibrium on individual chromosomes in European distribution of male-female pairings and neither sex nor
populations, it may only have covered 10% of the line of inheritance influence disability, age at onset or
genome in detail and another 20% in part, leaving course. In this situation, disability is highest in the male
much yet to be explored. Perhaps its main value lies in offspring of affected fathers, who more commonly
the exclusion of many microsatellite markers lying in follow a primary progressive course.
blocks of linkage disequilibrium of varying size rather The risk of ▶autoimmunity is increased in the relatives
than in the provisional positive associations. New of probands with multiple sclerosis. Three surveys,
screens based on single nucleotide markers present on together involving around 4000 relatives of 1000
individual chips, and at a much higher density are now probands, have shown recurrence of multiple sclerosis
in progress. in 15% with another autoimmune disease (Graves’
disease, rheumatoid arthritis and diabetes) in about 5%
Future Strategies for Identifying Susceptibility Genes of pedigrees. Several other disorders have been
Once regions of interest are mapped, the next aim is to considered more frequent than expected in patients
move from whole genome screening to the identifica- with multiple sclerosis. None of these is entirely
tion of functional polymorphisms which condition one secure but the there may be co-morbidity between
component or another of the disease process and neurofibromatosis 1 and primary progressive multiple
determine variations in the clinical course and features. sclerosis.
How to reach that position is less clear and several A minority of patients who meet clinical criteria for the
parallel strategies have been suggested. One is to add diagnosis of multiple sclerosis and in whom there are
incrementally to the number of available famil``õõies associated magnetic resonance imaging abnormalities
until thresholds for linkage are reached for the and cerebrospinal fluid oligoclonal bands have an
identification of secure loci using statistical criteria illness in which there is disproportionate involvement
for genome wide significances. An alternative is to of the anterior visual pathway. These are commonly
accept that the combination of linkage and association women with male relatives already known to be
now available is sufficient to concentrate the search for affected by ▶Leber’s hereditary optic neuropathy and
positional candidates within regions of interest. Each they have pathological mutations of mitochondrial
Genetic Screen 683
Genetic Screening in Populations. Figure 1 The pedigree shows how two individuals with FH (probands) are
traced to a common ancestor (upwards tracing arrows). The oldest individual alive in each family lineage was
identified as key individual and contacted for genetic testing (downward tracing arrows). Offspring of the key
individuals positive for the mutation were recruited for testing. If positive, their offspring were also called in and so on.
Relatives of key individuals negative for the mutation were not recruited. + means a deceased individual and black
filling an affected individual.
686 Genetic X-Linked Disease
Definition
Genome screen describes the testing of a population
Genome Engineering group to identify a subset of individuals at high risk for
having or transmitting a specific genetic disorder, by
using several hundred markers selected from the whole
genome to identify chromosomal regions that are co-
▶Cre/loxP Strategies
inherited (linked) with a specific disease.
▶Atopy Genetics
▶COPD and Asthma Genetics
▶Genetic Screening in Populations
Genome Instability
Genome-Wide Analysis
Definition
Genome instability describes processes in cells that
accumulate mutations with high frequency. These Definition
mutations include point mutations, insertions, deletions A genome-wide analysis is the systematic investigation
and translocations. of all regions of the genome to determine those
▶Chromosomal Instability Syndromes polymorphisms more often associated with a disease.
▶DNA-Repair Mechanisms ▶Common Diseases, Genetics
688 Genomic Analysis of Single Disseminated Cancer Cells
Gene expression analysis of single disseminated antigen HA-1 is aberrantly expressed on single
cancer cells disseminated cancer cells. This finding makes it
With the completion of the human genome project and reasonable to apply allogeneic bone marrow transplan-
the introduction of technologies such as DNA micro- tation as immunotherapy in a HA-1 mismatch situation.
arrays and laser microdissection, many fields in We recently adopted the protocol for high-density
biology and medicine await the application of compre- oligonucleotide microarrays. Thus, screening of all
hensive gene expression analyses of specific cell types expressed human genes may reveal new target
isolated from defined tissues. For the amplification of structures on single disseminated cancer cells, the
single cell mRNA the first protocols were introduced in precursor cells of lethal metastasis.
the late eighties and early nineties of the last century. So
far the protocols are based on either of two principal Clinical Relevance
approaches, linear amplification by T7 RNA polymer- Clinically manifest metastatic disease can rarely be cured.
ase or PCR-based amplification. One likely reason is that the cancer cells have
As a general rule, PCR-based methods are easier to genomically and phenotypically progressed so far that
handle and less time consuming, while there are they are highly resistant against current ways to induce
concerns about the quantitative reliability of measure- apoptosis by any type of treatment, another that the G
ments obtained after exponential amplification. The tumour burden is just too large for complete tumour cell
linear amplification achieved by T7 RNA polymerase, eradication at tolerable drug doses. Therefore, systemic
also referred to as the Eberwine protocol, has the therapies are added to loco-regional treatment (e.g.
advantage that a potentially occurring failure to amplify surgery or irradiation therapy) before metastasis becomes
a given transcript will not be exponentially transmitted. manifest. Such therapies target the relatively few tumour
Here, mRNA is transcribed by a primer containing the cells that spread throughout the body, have therefore been
promotor of the T7 RNA polymerase. After ▶cDNA called “adjuvant” and are currently in the centre of clinical
synthesis, in-vitro transcription is performed and the efforts. The underlying rational is to destroy the tumour
procedure is repeated once or twice. The effect of the seed timely when the tumour load is low and the cells still
few cycles is thought to change only marginally – if at all vulnerable. However, adjuvant chemotherapies, which
– the original template ratios. On the other hand, several are currently the best characterised and most effective,
groups have observed that the relative abundance of have not fulfilled the hopes so far. Although several
transcripts is also preserved by PCR-based methods – therapy regimens improve significantly the overall- and
provided that the correct conditions are applied. the disease-free survival of the patients, the absolute
Our preferred method belongs to the approaches using benefit is rather low being in the range of few percent and
PCR. The protocol uses a single primer that binds to improving survival time for the individual patient by few
two binding sites artificially introduced to all mRNA months. It is increasingly recognized that one reason for
sequences. First, a poly-C flanking region is incorpo- the failure of adjuvant therapies is the almost complete
rated during cDNA synthesis and after reverse lack of knowledge about the target cells – the
transcription a poly G-tail is added. Four aspects seem disseminated cancer cells. Disseminated cancer cells are
to be particularly important. Firstly, single cell mRNA genomically and phenotypically often very different from
is bound to a solid phase enabling the change of buffers their matched primary tumours. Therefore, therapies that
and thereby always optimal conditions for each are based on mechanisms active in primary tumours do
enzymatic reaction. Secondly, random primers for not necessarily exert an effect on disseminated cancer
cDNA synthesis reduce the length of primary transcript cells. Rather, direct analysis of single disseminated cancer
and allow for subsequent amplification within the cells promises to uncover novel molecular targets for
optimal range for PCR. Thirdly, a poly-G tail provides effective adjuvant therapies.
a much better primer binding site than a poly A or poly
T tail. Fourthly, introducing a poly-C flank on one side References
of the template and a poly-G tail on the other makes all 1. Braun S et al (2000) Cytokeratin-positive cells in the bone
sequences equally G/C-rich at their primer-binding site. marrow and survival of patients with stage I, II, or III
Adequate conditions for a single poly-C PCR primer, breast cancer. N Engl J Med 342:525–533
i.e. high annealing temperature and the addition of 2. Cole BF, Gelber RD, Gelber S et al (2001) Polyche-
denaturing agents such as formamide enable highly motherapy for early breast cancer: an overview of the
specific and unbiased amplification of such sequences. randomised clinical trials with quality-adjusted survival
analysis. Lancet 358:277–286
With this amplification method in hand, gene expres- 3. Iscove NN et al (2002) Representation is faithfully
sion profiling of single cells has become possible and preserved in global cDNA amplified exponentially from
first interesting results have been obtained. For sub-picogram quantities of mRNA. Nat Biotechnol
example, we found that the minor histocompatibility 20:940–943
690 Genomic Clone
4. Klein CA et al (1999) Comparative genomic hybridiza- allows the cells to differentiate the alleles of paternal
tion, loss of heterozygosity, and DNA sequence analysis and maternal origin of a gene without changing their
of single cells. Proc Natl Acad Sci USA 96:4494–4499 DNA sequence. The phenomenon of imprinting was
5. Klein CA et al (2002) Combined transcriptome and
genome analysis of single micrometastatic cells. Nat explicitly recognized at the beginning of the 1980s
Biotechnol 20:387–392 on the basis of two types of observations. First,
6. Klein CA (2003) The systemic progression of human pronuclear transplantation studies on mouse zygotes
cancer: A focus on the individual disseminated cancer cell demonstrated that monoparental conceptuses were
– the unit of selection. Adv Cancer Res 89:35–67 not viable, suggesting that biparental contribution is
7. Telenius H et al (1992) Degenerate oligonucleotide- necessary for mammalian development (1, 2). Second,
primed PCR: general amplification of target DNA by a
systematic genetic studies of mice with chromo-
single degenerate primer. Genomics 13:718–725
8. Zhang L et al (1992) Whole genome amplification from a somal translocation showed that some chromosomal
single cell: implications for genetic analysis. Proc Natl regions must be inherited from both parents for
Acad Sci USA 89:5847–5851 normal development (3). These pioneer studies led to
the construction of a low-resolution chromosomal
imprinting map of the mouse genome. It was postulated
that the requirement for both parental genomes to be
present in the same zygote was a consequence of
Genomic Clone differential epigenetic marks on a fraction of the
paternal and maternal alleles. The parental origin-
specific imprints on the two alleles of the same gene
Definition lead to their differential expression. Typically, one
Genomic clone denotes a fragment of cloned DNA parental allele is silenced and only the other remains
originating from the genome of the organism of interest functional. It is precisely this property that has been
rather than from a reverse-transcript of an RNA. used to identify imprinted genes. A gene is considered
▶YAC and PAC Maps imprinted if it is expressed monoallelically and its
allelic expression depends on its parental origin.
However, the real situation is more complicated. The
analyses of the known imprinted genes have demon-
strated that most of them are expressed biallelically in
Genomic Control some tissues, at least at some developmental stages. As
a consequence, we face a paradoxical situation. It is
quite difficult to prove that a gene is not imprinted,
unless its allelic expression is examined in all tissues at
Definition
all stages of the life. The ambiguity of the definition
Genomic control describes a method to control for
might be the reason for the difficulty estimating the
population stratification in association studies. The
number of imprinted genes in the genome. On the basis
degree of genotype-phenotype association for a large
of the relatively low frequency of mutations with
number of neutral polymorphisms is measured and
parental origin-dependent phenotype, the number of
used to correct associations with candidate causal
imprinted genes was initially estimated as not more
polymorphisms.
than 300. However, a systematic transcriptome analysis
▶COPD and Asthma Genetics
suggested that there might be more than 2,000 genes
with differential parental origin dependent expression
in the mouse.
The first two imprinted genes, Igf2r and Igf2, were regions also show recombination with different
identified in 1991 on the basis of the parental origin- frequencies during male and female meiosis.
dependent phenotypes of heterozygous mutants. More
Molecular Mechanisms
than 70 genes have been identified in the mouse and
As indicated above, DNA methylation is a part of the
human genome so far. (We quote here two compre-
mechanisms that differentiate the parental alleles of
hensive databases of imprinted genes that can be found
imprinted genes. Methylation of cytosines in CG
on the web: ▶http://www.mgu.har.mrc.ac.uk/research/
dinucleotides (CpG methylation) is a well-known
imprinting/imprinting2.html and ▶http://cancer.otago.
▶epigenetic modification of the DNA that regulates
ac.nz/IGC/Web/home.html). Detailed analyses of sev-
chromatin structures such as ▶heterochromatin in
eral of these genes made it possible to determine
concert with covalent ▶histone modifications. The
general characteristics of imprinted genes in addition to
DMRs are usually relatively short, CG-rich DNA
their monoallelic expression:
segments located at a distance from genes, but
1. Imprinted genes are frequently associated with sometimes located in the promoter region or in the
regulatory regions that carry differential ▶DNA coding or intronic sequences. The best characterized
methylation on the two parental alleles. Several DMRs in the human and mouse genome include those
imprinted genes were identified on the basis of located in the regions of the Igf2r, H19, Igf2, Snrpn,
G
systematic search for ▶differentially methylated U2af1-rs1, Gnas and Gtl2 genes. The functional
regions (DMRs) in the genome. In general, DNA importance of these elements for the establishment
methylation upstream of genes, especially in the and maintenance of the methylation imprint has been
promoter region, is associated with attenuation of demonstrated by extensive targeted mutagenesis stu-
the expression. dies. Their deletions frequently perturb the function of
2. It is impossible to classify the imprinted genes on the whole imprinted domain. These elements are
the basis of the products they encode. Peptide usually called imprinting centers for their central role
hormones, growth factors, transcription factors, in the imprinting of a whole region. They most
metabolic enzymes, cell surface receptors and many probably act as a structural organizer affecting gene
other proteins, but also several non-coding RNAs expression over the whole imprinting cluster. This
can be found among the products of imprinted action is presumably mediated by recruiting various
genes. proteins, for instance the CTCF protein, that play a role
3. Another characteristic feature of imprinted genes in making up highly-ordered chromatin structure.
discovered so far is their non-random distribution in Studies of chromatin structure around imprinted genes
the genome. They are frequently clustered in well- revealed differences in nucleosome positioning, his-
defined genomic regions of up to several hundred tone acetylation or nuclease sensitivity between the
kilobases. These clusters usually contain imprinted parental alleles of imprinted genes. In general,
genes with either maternal or paternal monoallelic methylated sequences are associated with hypoacety-
expression, but also genes that have been found to be lated histones, whereas the unmethylated sequences are
biallelically expressed in all tissues analyzed so far. associated with hyperacetylated histones. The differ-
4. Interestingly, many antisense transcripts are also ences observed between the two alleles of an imprinted
detected in the imprinted genomic regions. gene in the same tissue are similar to those typically
5. An important feature of the clusters is their observed between active and inactive copies of the
▶asynchronous replication, which occurs in com- same gene in different tissues. Naturally occurring, or
mon as far as we know, suggesting that this experimentally induced mutations that alter the enzy-
phenomenon is one of the useful criteria that help matic mechanisms responsible for epigenetic modifica-
determine imprinted genes (4). The paternal and tions such as DNA methylation and histone
maternal copies of the whole clusters replicate modifications frequently disturb the normal imprinting
differentially during the mitotic cell cycle, including process and modify the allelic expression of imprinted
imprinted genes, intergenic sequences and the genes genes. In addition to the epigenetic modifications, the
that are expressed biallelically. This characteristic is observation of several ▶antisense RNA transcriptions
independent of the expression state of the genes in in imprinted regions suggests that non-coding RNAs
the cluster and is detected in all cell types. They all might be also involved in the maintenance of the
suggest that cluster level of imprinting regulation characteristic chromatin structure and monoallelic
might be assigned in the context of chromatin expression of imprinted genes. A role of non-coding
structure as described below. RNAs in this process has been suggested by analogy
6. Imprinted chromosomal regions display strikingly with the function of the Xist RNA in the inactivation of
different recombination frequencies during male one of the X-chromosomes in females, although the
and female meiosis. However, many non-imprinted role remains unknown.
692 Genomic Imprinting
Genomic Imprinting. Figure 1 The schematic representation of the cycle of acquisition/erasure of genomic
imprinting in the germ line. Note that, in the germ cell lineage, the epigenetic marks should be erased and then
re-established according to the sex of the individual.
Establishment of the Imprint cell division, suggesting that ▶CpG methylation is not
Each individual inherits a paternal and a maternal copy the only molecular mechanism that plays a role in
of every gene in the genome. However, both alleles are marking the two parental alleles.
transmitted to the offspring either as paternal or as Other characteristics of the imprinted genomic regions
maternal copies depending on the individual’s sex. follow different kinetics during development. For
Therefore, the parental imprint of a gene or a gene example, asynchronous replication of the two parental
cluster has to be erased in the germ line of the copies is maintained during the proliferation of PGC,
individual and re-established according to its sex in when all methylation differences are already erased,
mature gametes (Fig. 1). In order to follow this process, suggesting that differences between the parental copies
changes in CpG methylation pattern of DMRs were are still there even in the absence of methylation.
extensively studied at various imprinted loci. In
general, the differences in CpG methylation pattern Clinical Relevance
are erased from the parental alleles in early ▶primor- The biological significance of the functional none-
dial germ cell (PGC) differentiation. The methylation quivalence of the parental genomes is not yet known.
pattern typical for the paternal or maternal alleles is Many hypotheses were proposed to explain why
established in meiotic cells. At the moment of imprinting has evolved in mammals. The most popular
fertilization, many DMRs are already differentially hypothesis is the so-called parental conflict model.
methylated and conserve their allele-specific methyla- According to this model, imprinting has evolved in
tion profiles at all subsequent stages of development mammals because of the conflicting evolutionary
while the bulk of the genome undergoes important interests of the paternal and maternal genomes over
methylation changes. However, some DMRs acquire the allocation of parental resources. This hypothesis is
their methylation profiles gradually during the somatic based on the assumption that fetal imprinted genes
Genomic Information and Cancer 693
regulate resource transfer from the mother to the fetus. 5. Paldi A (2003) Genomic imprinting: Could the chromatin
Therefore, parents are able to modulate the use of structure be the driving force? Curr Top Dev Biol 53:
115–137
resources by transmitting epigenetically modified
6. Lalande M (1997) Parental imprinting and human disease.
versions of imprinted genes to their offspring. Since Annu Rev Genet 30:173–195
the fetuses develop within the maternal uterus, the
paternal investment in the offspring is obviously much
lower than the maternal investment. This asymmetry
leads to an asymmetry of the imprints on the resource
usage-regulating genes. Genomic Information and Cancer
Another possible explanation is that maintaining
the differential chromosomal structure of the im-
printed regions could be important for the coordinated
T RAVIS D UNCKLEY, K EITH D. C OON ,
replication of the genome and the correct segregation of
D IETRICH A. S TEPHAN
the chromosomes during mitosis (5). The monoallelic
The Translational Genomics Research Institute,
expression of imprinted genes might be a byproduct of
Phoenix, AZ, USA
this process. Indeed, some experimental observations
dstephan@tgen.org G
indicate that the parental copies of imprinted regions
interact with each other during the somatic cell cycle in a
way that is reminiscent of some trans-sensing phenom-
ena observed in Drosophila or plants.
Synonyms
Whatever the biological function of parental imprint- Genomics
ing, perturbations of the process lead to severe
hereditary disorders that develop because of mutations Definition
in the active allele of imprinted genes, the normal but Genomics represents the systematic study of the entire
silenced allele being unable to compensate for the genetic complement (DNA and RNA) of an individual
mutated copy (6). Biallelic expression of usually or population of individuals. Since uncontrolled growth
monoallelically expressed imprinted genes has also of cancerous cells results from inherited or somatically
been implicated in various cancers. For example, acquired mutations scattered throughout the comple-
Prader-Willi syndrome patients often display hypoto- ment of our chromosomes and often affects mRNA
nia, hyperphagia, obesity, hypogonadism and develop- expression, the holistic tools of genomics are particu-
mental delay. Angelman syndrome patients frequently larly relevant to understanding the molecular mechan-
show ataxia, tremulousness, sleep disorders, seizures isms underlying this set of diseases. As such, genomics
and hyperactivity. Both syndromes may also show has played a major role in advancing tumor subclassi-
mental retardation and map to the imprinted gene fication, diagnosis and individualized treatment of
cluster in human chromosome 15q11-13. Beckwith- patients.
Wiedemann syndrome maps to 11p15 and is character-
ized by general overgrowth with symptoms such as Characteristics
hemihypertrophy, macroglossia and visceromegaly. Cancer results from the accumulation of genetic
▶G-Proteins and G-Protein Mutations in Human damage superimposed upon inherited predisposition.
Diseases This damage manifests itself in the form of DNA
▶Microdeletion Syndromes mutations, chromosomal aberrations or epigenetic
▶Prader Willi and Angelman Syndromes alterations in the chromatin structure. Elucidating the
specific genetic events involved in the pathogenesis of
different cancer types will be critical for the develop-
References ment of effective diagnostics and treatments. Initially,
genetic studies in cancer focused primarily on heritable
1. McGrath J, Solter D (1984) Completion of mouse
embryogenesis requires both the maternal and paternal rare and highly penetrant alleles of cancer predisposi-
genomes. Cell 37:179–183 tion genes. Examples include the ▶tumor suppressors
2. Surani MA, Barton SC, Norris ML (1984) Development Rb1 and p53. However, heritable predisposition alleles
of reconstituted mouse eggs suggests imprinting of the account for a small percentage of cancer causing
genome during gametogenesis. Nature 308:548–550 events. Multiple combinations of weak genetic variants
3. Cattanach BM, Kirk M (1985) Differential activity of may have a much larger impact on the development of
maternally and paternally derived chromosome regions in
mice. Nature 315:496–498
cancer in the general population, the majority of whom
4. Kitsberg D, Selig S, Brandeis M et al (1993) Allele- do not have inherited alleles of the known highly
specific replication timing of imprinted gene regions. penetrant cancer predisposing genes. Thus, cancer can
Nature 364:459–463 be viewed as a multigenic disease. The challenge of
694 Genomic Information and Cancer
cancer genomics is to identify the multiple genetic applicable to the study of any disease with a genetic
variants that are involved in the development of cancer basis.
and to determine their effects on the molecular
pathways of the premalignant cell type. DNA Sequencing
Identifying the genetic determinants of such a complex, DNA sequencing represents the ultimate in high-
multigenic disease as cancer necessitates the develop- throughput cancer analysis. Once we fulfill the
ment and use of high throughput methods of genomic mandate of the HGP for rapid, whole-genome sequen-
analysis. Toward that end, high throughput genomic cing at minimal cost, we will revolutionize the ability to
DNA scanning technologies (sequencing, LOH, CGH understand and diagnose cancer. Early attempts include
and FISH) as well as microarray-based technologies massive parallel DNA arrays for hybridization-based
have been critical. Microarray technology involves the sequencing. These arrays, designed by Perlegen
fixation of DNA molecules to a slide or wafer. These Sciences, Inc. (CA, USA) consist of overlapping
DNA molecules can be placed on the microarray slides oligonucleotide probes that span the entire genome.
at very high densities, allowing for high throughput This makes possible direct and rapid sequencing of the
genome-wide analyses. Various microarray technolo- entire genome. This has clear advantages, particularly
gies exist, including ▶single nucleotide polymorphism when an unidentified disease gene has been mapped
(SNP) arrays for linkage and LOH studies, ▶compara- previously using standard positional cloning strategies.
tive genomic hybridization (CGH) arrays, DNA
sequencing arrays and perhaps most widely recog- Comparative Genomic Hybridization
nized, gene expression microarrays. Comparative genomic hybridization (CGH) allows one
to visualize gross chromosomal losses or gains by a
Single Nucleotide Polymorphism Genotyping for modification of traditional karyotyping. The entire
Linkage and LOH Studies genomic complement of a normal individual is labeled
Traditionally, linkage analyses and ▶loss of hetero- and compared to a genomic sampling from a tumor that
zygosity (LOH) have been done on a genome-wide has been labeled with a different fluorophore. While
scale using ▶microsatellite markers at a density of revolutionary, the typical resolution of traditional CGH
10 cM (Mb) intervals. This is a tedious methodology. is 10 Mb. Array CGH provides the advantage of
The resolution is appropriate for linkage studies but higher resolution compared to traditional CGH methods
inadequate for LOH in the majority of cases. New (up to 0.5 Mb) and is generally useful for identifying
information from the human genome project (HGP) has large deletions, insertions, amplification events or
resulted in a high resolution SNP map of the human overall changes in ploidy. Importantly, array CGH can
genome, as well as new technologies for rapidly be used as an adjunct to expression microarrays (see
genotyping these SNPs. Single nucleotide polymorph- below) since the molecular events detected by CGH will
isms represent DNA base pair variations within a have effects on gene expression levels.
population of individuals. They can either have no
effect on gene expression or may have subtle effects Expression Microarrays
that, when combined, may lead to disease phenotypes. Gene expression microarrays are used to rapidly assess
An SNP occurs, on average, once in every 1300 base the gene expression profile of different cell types. For
pairs of the human genome and they account for the genetic mutations to affect cell proliferation, and hence
majority of genetic variability between individuals the development of cancer, they must alter the function
within a population. Although SNPs are biallelic and of at least some of the signaling pathways inside the
thus less informative, their density of every 30 kb (on affected cell. These effects can be seen as altered mRNA
the new Affymetrix 100k SNP array) allows larger expression profiles (even members of phosphorylation
haplotype block content to be inferred. Thus SNPs cascades can be dysregulated) in malignant cell types
are likely to be equally informative over multiple and can be identified using expression microarrays.
adjacent SNPs as well as having the ability to identify There are two main variations of expression arrays
smaller hemizygous deletions. SNP array technology currently in use, cDNA and oligonucleotide micro-
has great potential for discovering multigenic contribu- arrays. Oligonucleotide arrays provide the benefits of
tions to cancer development. For example, by compar- greater specificity since the probes used are of shorter
ing the SNP profiles from a group of individuals sequence (25–70 nucleotides) than those used for
that have a certain cancer type to the SNP profiles cDNA arrays (200–2000 nucleotides). cDNA arrays
of unaffected individuals, one can identify a set of have greater sensitivity but cannot, for example,
SNPs that are uniquely associated with that form of discriminate between splice variants. For each type of
cancer (1). Since the sequence information for the SNPs array, sequences of DNA that are homologous to
is known, one can rapidly move towards identifying the different genes of interest are attached to a glass slide
relevant genes. Importantly, this technology is generally at different locations. Each probe (or probe set for
Genomic Information and Cancer 695
oligonucleotide arrays) on a slide corresponds to a single such as surgical resection of at risk tissue, are needed.
gene and each slide can hold thousands of probes. One However, this approach suffers from the limitation that
then hybridizes labeled cRNA from the cell type of not all individuals carrying susceptibility genes will
interest to the microarray, rapidly generating an mRNA develop cancer. Further subclassifying a person’s
expression profile for thousands of genes. cancer risk based on additional genetic determinants
will restrict such surgical prevention strategies to only a
Integration of LCM into ▶Expression Profiling subset of individuals with the highest likelihood of
One important limitation of expression profiling has developing particularly aggressive forms of disease.
been that the tissues used often contain multiple cell The above example with breast cancer illustrates the
types in addition to the diseased or cancerous, cells. expectation that, as more tumors are profiled and
Additionally, cancers are almost always mosaic with subcategorized, expression profiling could become a
respect to acquired somatic changes. Thus, they are general tool for predicting the course of cancer
heterogeneous with respect to the clinical and histo- progression and for guiding prevention strategies in
pathological trait under study. This adds unwanted unaffected individuals. In the future, subcategorizing
expression signatures to the global expression profile tumors based on their expression profiles may aid in
obtained and may generate misleading results. ▶Laser patient-specific therapies that are designed to be most G
capture microdissection (LCM) is increasingly being effective in the clinic on a real-time basis for the
used to overcome this limitation. LCM uses an infrared treatment of particular forms of cancer.
laser to select only cell types of interest from a thin Microarrays are also important for identifying dysre-
section of tissue sample. In the study of cancer, this gulated genes and signaling pathways that are involved
allows the analysis of a nearly homogeneous population in tumor development and progression (4). Identifica-
of malignant cells, generating a cleaner expression profile tion of these genes and pathways will have important
that is more indicative of the cancerous state. Because the implications for the development of novel anticancer
volume of cells harvested using this technique is low, therapies since they provide novel targets for treatment.
RNA amplification techniques must be used to generate As expression profiling in the field of cancer biology
enough RNA for expression profiling. moves forward, a critical goal will be to translate the
vast amounts of biological data into meaningful clinical
Clinical Relevance advances. This will require large collaborative efforts
Clearly, identification of the underlying DNA or RNA pooling the combined knowledge and expertise of
defects leading to cancer development and progression different institutions to accomplish successfully all of
will lead to a greater understanding of tumorigenesis the required goals from tumor sample acquisition, to
and will translate directly into drug design. SNP genomic analysis, to target identification and valida-
analyses provide the potential for early and reasonably tion, to drug design and discovery.
noninvasive cancer diagnosis by detecting heritable
cancer specific mutations in peripheral tissues, such as Summary
blood. As a result, treatments may be commenced Analysis of SNPs and gene expression profiling are
earlier than was previously possible (2). Additionally, valuable methods for identifying the genetic determi-
SNP analysis could become a routine screening nants of cancer. However, to realize the value of these
procedure to identify individuals with SNP haplotypes techniques fully, it will be critical to translate the
that place them at risk for developing specific forms of findings into practical applications that can benefit
cancer. This information could be used to direct at risk individuals who are suffering from cancer and those
individuals to appropriate prevention strategies. As the who are at risk of developing particular forms of
technology matures, diagnosis and screening using this cancer. Knowledge of an individual’s innate suscept-
methodology may become economically feasible for ibility to various cancer types can be used to guide the
general practice. In addition to providing a diagnosis course of prevention strategies that focus, for example,
method, SNP analyses will be useful for identifying on lifestyle changes such as diet and exercise. In
causative mutations and affected molecular pathways addition, SNP and expression profiles can be used both
in various cancers. Following validation of these to diagnose cancers accurately and to subcategorize
pathways, new targets for therapy will emerge. tumor types based on the severity of the malignant
Gene expression profiling has numerous important phenotype. This information could then be used to
clinical applications. First, expression profiling already target the most aggressive therapies to patients with the
has been used to predict the prognosis of disease course most severe or invasive forms of disease. Ultimately,
in ▶breast cancer (3). In breast cancer susceptibility the information gleaned from these powerful genomics
screening, genetic testing to identify individuals techniques will be used to identify novel targets for
carrying known predisposition alleles has been used therapeutic intervention with the eventual endpoint of
to indicate when more extreme prevention strategies, preventing tumor growth and metastasis.
696 Genomic Instability
References
1. Hoque MO, Lee CC, Cairns P et al (2003) Genome-wide Genotoxin
genetic characterization of bladder cancer: a comparison
of high-density single-nucleotide polymorphism arrays
and PCR-based microsatellite analysis. Cancer Res Definition
63:2216–2222 A genotoxin is a chemical, or another agent, which
2. Sidransky D (2002) Emerging molecular markers of
cancer. Nat Rev Cancer 2:210–219 damages cellular DNA resulting in mutations and/or
3. van ’t Veer LJ, Dai H, van de Vijver MJ et al (2002) cancer.
Expression profiling predicts outcome in breast cancer. ▶Chromosomal Instability Syndromes
Breast Cancer Res 5:57–58
4. Mac Donald TJ, Brown KM, La Fleur B et al (2001)
Expression profiling of medulloblastoma: PDGFRA and
the RAS/MAPK pathway as therapeutic targets for
metastatic disease. Nat Genet 29:143–152
Genotype
Definition
Genotype refers to the genetic constitution of an
Genomic Instability organism or cell; be it the alleles at a given locus, or
those of several loci. For an autosome, the genotype for
a specific chromosomal location would be 2 alleles.
Definition ▶COPD and Asthma Genetics
Genomic instability describes a phenotypic feature of ▶Diabetes Mellitus, Genetics
the cell, in which the genetic material mutates at a faster ▶Familial Dilated Cardiomyopathy
rate than normal as a consequence of a deficiency in ▶Large-Scale ENU Mutagenesis in Mice
proteins that function in ▶DNA repair, cell cycle ▶Schizophrenia Genetics
checkpoints, chromosome structure maintenance, and
chromosome segregation, etc.
▶Bloom Syndrome
▶DNA Helicases
▶DNA Repair Mechanisms Genotype-Driven Approach
Definition
Genotype-driven approach describes a plan of action
based on the hypothesis that a specific gene is
Genomics responsible for a specific function. To this end, the
specific gene is mutated and the resulting phenotype
(appearance) of the organism provides information
Definition about the function of the gene.
The mapping, sequencing, and analysis of an organ- ▶Mouse Genomics
ism’s genome.
▶Protein Databases
Genotype-Phenotype Correlations
Genomics Definition
Genotype-phenotype correlations describe the relation-
ship between genotype (polymorphisms, sequence,
▶Genomic Information and Cancer variants, and mutations) and phenotype (their clinical
▶Functional Genomics, the Systematic Analysis of the expression).
Function of All Genes and Gene Products in Parallel ▶Heritable Skin Disorders
GFAP 697
Definition G
Geranylgeranyl is a 20 carbon unit made up of four
Germline Mutation
isoprene (dimethyl allyl) units, and in the form of the
pyrophosphate, is a precursor molecule in cholesterol
biosynthesis. Definition
▶Protein Prenylation Germline mutation denotes a mutation that affects the
▶Tangier Disease complete organism including the germ cells, and thus is
passed on to the progeny of the affected individual.
▶Microarrays in Pancreatic Cancer
Germ Cells
Germline Transmission
Definition
Germ cells are pre-meiotic or post-meiotic sperm cells Definition
and egg cells. Germline transmission refers to a process where the ES
▶Mutagenesis Approaches in the Zebrafish derived cells of a chimera contribute to the reproductive
cells of a mammal (germ cells) and are genetically
passed to its offspring.
▶Large-Scale Homologous Recombination Ap-
Germinal Vesicle proaches in Mice
Definition
Germinal vesicle is the meiotic prophase nucleus of an GFACT Expression screening
amphibian oocyte.
▶Xenopus as a Model Organism for Functional
Genomics ▶Genome Functionalization by Arrayed cDNA Trans-
duction (GFACT) Expression screening
populations of ▶nestin-expressing neural stem cells. It the patch clamp analysis. The giga-seal is characterized
has thus become an ambiguous marker. by a large electrical resistance that reaches values in the
▶Neural Stem Cells giga-ohm range.
▶Patch Clamping
GFFKR Domain
GLI
GFP
Definition
▶Green Fluorescent Protein GLI comprise a family of zinc finger transcription
factors involved in both developmental regulation and
human diseases. Zinc-finger transcription factors of the
GLI family play critical roles in the mediation and
GGA Proteins interpretation of Hedgehog signals. The Drosophila
homologue is Cubitis interruptis (Ci).
▶Hedgehog Signalling
Definition ▶Wnt/Beta-Catenin Signaling
GGA proteins (Golgi-associated, γ-adaptin homolo-
gous, ARF-interacting proteins) constitute a conserved
multidomain protein family involved in traffic between
the Golgi complex and endosomes. They are recruited
to membranes by GTP-bound ▶ARF. They can interact
with trafficking motifs present on certain cargo, e.g. the Glial Cells
mannose–6–phosphate receptor, and also interact with
▶clathrin, making them functionally analogous to
▶adaptor complexes. Definition
▶Vesicular Traffic Glial Cells are the non-neuronal cells of the nervous
system. Glial cells do not carry nerve impulses (action
potentials) but do have essential supportive functions,
including physical support, provision with nutrients
Giga-Seal and trophic factors (astrocytes), insulation of axons
(oligodendrocytes and Schwann cells), and phagocytic
functions (astrocytes and microglia). During develop-
Definition ment, radial glial cells provide a scaffold for neuronal
Giga-seal denotes the tight connection between the tip migration, and they function as neuronal progenitors.
of a patch clamp pipette and the cell membrane during ▶Glial Cells and Myelination
Glial Cells and Myelination 699
Glial Cells and Myelination. Figure 1 Cross section of a myelinated axon at the electron microscopic level
(upper left), the ultrastructure of compact myelin (lower left), and schematic depictions of structural proteins in
myelin. The condensed cytoplasmic membrane surfaces form the electron-dense major dense line, extracellular
membrane adhesion forms the intraperiod line. The membrane itself is electron-lucent. Membrane proteins
associated with human myelin diseases are depicted in red. PLP/DM20, proteolipid protein; MBP, myelin basic
protein; P0, protein zero; PMP22, peripheral myelin protein of 22kD; Cx32, connexin of 32 kD; MAG,
myelin-associated glycoprotein; CNP, cyclic nucleotide phosphodiesterase.
are enriched in the cell membrane is not understood. to ER retainment of the mutant protein, unfolded
Myelin is particularly rich in cholesterol. Galactosyl- protein response and oligodendroglial apoptosis
cerebroside (GalC) and its sulfated form (sulfatide) are
nearly myelin-specific. Absence of GalC and sulfatide Myelin Protein Zero (MPZ, P0, McKusick *159440)
in mutant mice lacking UDP-galactose:ceramide ga- Myelin protein zero is the most abundant protein of
lactosyl-transferase leads to progressive demyelination compact PNS myelin, expressed exclusively by
and early death. Thus, both lipids are essential for Schwann cells. With a single transmembrane domain
normal myelination, although glucosylcerebroside (an and an extracellular Ig-like domain, P0 is a member of
alternative product) may compensate for some func- the Ig-superfamily of cell adhesion proteins. The
tions of GalC. Patients with the Smith-Lemli-Opitz crystal structure of the Ig-like domain, when combined
syndrome (McKusick #270400), a genetic disorder of with the analysis of P0-deficient mice, indicates that
cholesterol biogenesis, also have myelin abnormalities. homo-tetrameric P0 engages in homophilic interactions
The critical requirement for cholesterol in myelin with the opposing membrane layer, mediating mem-
assembly has been shown with conditional mouse brane adhesion and formation of the IPL. Additionally,
mutants deficient in squalene synthase, a critical and positive charges in the cytoplasmic domain contribute
specific enzyme of cholesterol synthesis. to the establishment of the MDL by direct interaction
with negatively charged head groups of membrane
Proteolipid Protein (PLP, McKusick *300401) phospholipids. Mutations of the human P0 gene cause a
In the CNS, the most abundant protein of compact peripheral neuropathy (CMT1B).
myelin is a hydrophobic integral membrane protein
(proteolipid) with four transmembrane domains and its Myelin Basic Protein (MBP, McKusick *159430)
smaller splice isoform (DM20). PLP/DM20 may form Myelin basic protein refers to a group of related cellular
homo-oligomers and associate with alpha(v)-integrin, proteins associated with both CNS and PNS myelin.
but the function of these interactions is speculative. The The MBP gene encodes at least 5 splice isoforms,
tight association of PLP/DM20 with cholesterol may be ranging from 14 to 21 kD in size. Positively charged
required for membrane ▶raft formation and normal amino acids interact with negatively charged head
membrane trafficking in oligodendrocytes. The ultra- groups of membrane phospholipids causing MBP to
structure of CNS myelin lacking PLP/DM20 in mediate and stabilize myelin compaction at the MDL.
▶knockout mice suggests that the extracellular portion The MBP gene is partially deleted in the natural mouse
of PLP acts as a strut, organizing the extracellular mutant shiverer, which presents with a severe demye-
apposition of myelin layers at the IPL, but myelination linated phenotype. The overall lack of myelin assembly
is possible in the absence of PLP. For comparison, point is not yet fully explained. Shiverer mice provided the
mutations in this gene (or PLP gene duplications) first opportunity to analyze the consequences of a
cause severe ▶dysmyelination in Pelizaeus-Merzbacher missing myelin protein before transgenic knockout
disease (PMD) and in rodent PMD models. This is due techniques became available. An involvement of the
Global Genome Repair 701
MBP gene in a human leukodystrophy has not yet myelin that stabilizes myelin through intramembranous
been demonstrated. tight junctions. The finding that OSP interacts with
tetraspanin-3/OAP-1 suggests that other tight junction
Peripheral Myelin Protein 22K (PMP22, proteins have yet to be identified in myelin.
McKusick *601097)
Peripheral myelin protein 22K is a glycosylated Nodal and Paranodal Specializations
integral membrane protein of PNS myelin. By topology For ▶saltatory nerve conduction, axonal voltage gated
and hydrophobicity PMP22 is related to the proteoli- sodium (Na+) channels must be clustered at the node of
pids of CNS myelin. PMP22 interacts with myelin Ranvier, separated from fast potassium (K+) channels that
protein P0 in the myelin membrane and may stabilize assemble beneath the myelin sheath at the juxtaparanode.
the myelin architecture. Experiments carried out K+ channels are associated with Caspr2, a member of the
in vitro suggest that PMP22 also regulates Schwann neurexin family of ▶adhesion molecules, probably via a
cell proliferation and apoptosis, but its in vivo function PDZ domain protein adapter. Caspr2 in turn is associated
is poorly understood. The PMP22 gene has captured with TAG-1, a GPI-anchored cell adhesion molecule of
interest because a gene duplication in humans underlies the Ig-like superfamily on the glial adaxonal membrane.
the most frequent peripheral neuropathy (CMT1A). Knockout experiments in mice demonstrated that axonal G
Caspr2 is required to maintain K+ channel clusters at the
Myelin-associated Glycoprotein (MAG, juxtaparanode. More devastatingly, in the absence of
McKusick *159460) TAG-1, axonal Caspr2 and K+ channels are unclustered at
Myelin-associated glycoprotein is a member of the Ig- the juxtaparanode. Na+ channel distribution is unaffected
superfamily of cell adhesion proteins, with a single by deletion of juxtaparanodal proteins.
trans-membrane domain and 5 extracellular Ig-like The assembly sites of axonal Na+ and K+ channels are
domains. Its localization at the innermost (adaxonal) divided by the paranode, a region devoid of ion channels.
membrane of both CNS and PNS myelin suggested that The paranodal axon is tightly attached to the glial
MAG is engaged in adhesion and signaling events paranodal loops by a septate-like junctional structure that
between glial cell and axon. The diameter of PNS seals against ion flux and separates Na+ from K+
axons is reduced in mice lacking MAG, suggesting that channels. The paranodal axon is molecularly defined by
MAG-mediated myelin-to-axon signaling regulates the a complex of the Ig-like GPI-anchored cell adhesion
phosphorylation status of the axonal cytoskeleton. molecule F3/contactin associated with the neurexin
Axonal binding partners of MAG include the NoGo Caspr1. This complex interacts with neurofascin155 on
receptor (NoGoR / reticulon 4 receptor, McKusick the paranodal loop. Disruption of individual components
*605566), and sialic acid residues of sialo-glycopro- in knockout mice impedes the septate-like junction.
teins or sialo-glycolipids (gangliosides).
DNA, with the exception of actively transcribed genes, superfamily that mediate an organism’s response to
and removes damage that could block DNA replication. glucocorticoids or mineralocorticoids, respectively, by
▶Nucleotide Excision Repair changing the transcription rates of glucocorticoid- or
mineralocorticoid-responsive genes.
▶Steroid Hormone Receptor Defects, Molecular
Basis
Glomerular Filtrate
Definition
Urine formation in the kidney begins when the fluid
portion of the blood leaves the glomerulus and enters the
Glucocorticoid/Mineralocorticoid
glomerular capsule as glomerular filtrate. Glomerular Resistance
filtrate consists of water and small size components of
blood, separated from blood cells. The glomerular
filtrate flows into the tubules, where further water is Definition
extracted from the filtrate, and minerals and other body Glucocorticoid/mineralocorticoid resistance are patho-
chemicals are absorbed from or secreted into the filtrate. logic conditions that demonstrate several manifesta-
▶Diabetes Insipidus, a Water Homeostasis Disease tions caused by partial insensitivity of tissues to
glucocorticoid or mineralocorticoid hormones. These
are frequently due to inactivating mutations in the
glucocorticoid or mineralocorticoid receptors.
▶Steroid Hormone Receptor Defects, Molecular
Glomerulonephritis Basis
Definition
Glomerulonephritis (GN) refers to inflammation of the
capillary loops of the glomeruli.
▶Morbus Wegener
▶SLE Pathogenesis Genetic Dissection
Glucocorticoids
Definition
Glucocorticoids are steroid hormones that are synthe-
Glomerulus sised from cholesterol by cytochrome P450 dependent
steroidhydroxylase, mainly in the adrenal cortex.
▶Mendelian Forms of Human Hypertension and
Definition Mechanisms of Disease
Glomerulus is the network of blood capillaries in the
cup-like end (Bowman’s capsule) of the nephron. It is
where waste products are filtered from the blood into
the kidney tubule.
▶Kidney
Glutamate
Glucocorticoid/Mineralocorticoid Definition
L-Glutamate is an excitatory amino acid neurotrans-
Receptors mitter. It influences almost all neurons in the brain.
Glutamatergic neurotransmission has been associated
functionally with a number of physiological and
Definition pathophysiological processes related to neuronal plas-
Glucocorticoid receptor (GR) and mineralocorticoid ticity and memory.
receptors (MR) are members of a nuclear receptor ▶Addiction, Molecular Biology
Glycohemoglobin 703
Definition Definition
GPx defines a family of homologous proteins that is Glycoconjugate refers to a compound that is composed
characterized by a catalytic triad composed of a of an oligosaccharide which is linked to a protein or
(seleno) cysteine, a glutamine and a tryptophan. These lipid.
enzymes reduce H2O2 and other ▶Biochemical Engineering of Glycoproteins
▶Free Radicals
Glycan Glycoform G
Definition Definition
Glycan is a general term for a polymer of mono- Glycoform describes various forms of a particular
saccharide units joined by glycosidic bonds. It may or species of glycoproteins that differ in the structures
may not have other components. and/or types of glycans.
▶Biochemical Engineering of Glycoproteins ▶Glycosylation of Proteins
▶Glycosylation of Proteins
Glycine Glycohemoglobin
Definition Definition
Glycine is an amino acid that is derived from dietary Glycohemoglobin stand for glycosylated hemoglobin.
sources, but is also generated endogenously from The ratio of glycohemoglobin and total hemoglobin is
glyoxylate. Glycine serves as an important inhibitory indicative of a person’s average blood glucose level
neurotransmitter, predominantly in the spinal cord, over the last months.
brain stem and retina. ▶Affinity Chromatography and In Vitro Binding
▶Peroxisomal Disorders (Beads)
704 Glycolysis
Definition Definition
Glycolysis is the metabolic pathway that occurs in the Glycosidic linkage describes the linkage of a mono-
cytoplasm of cells, and by which glucose is broken saccharide to another residue via the anomeric hydro-
down to pyruvic acid. xyl group. The linkage generally results from the
▶Limb Girdle Muscular Dystrophies reaction of a hemiacetal with an alcohol (e.g. a
hydroxyl group on another monosaccharide or amino
acid) to form an acetal.
▶Glycosylation of Proteins
Glycoprotein
Glycosylase
Definition
Glycoprotein defines a protein with one or more Definition
carbohydrate moieties that are covalently bound to it. Glycosylase is an enzyme that catalyzes the cleavage of
▶Affinity Chromatography and In Vitro Binding an N-C1' glycosylic bond, which links a DNA base to
(Beads) the deoxyribosephosphate backbone of DNA.
▶Biochemical Engineering of Glycoproteins ▶Base Excision Repair
▶Glycosylation of Proteins
▶Protein Databases
Glycosylation of Proteins
contributes more than 90% of the total weight. Another differences are often even greater in glycoproteins
type of glycoprotein is the glycosylphosphatidylinositol- between different organisms. Identifying the structures
anchored or ▶GPI-anchored glycoprotein. These contain of the glycans made by different organisms is known as
a C-terminal amino acid that is linked to ethanolamine, the field of glycomics and identifying sites of glycan
which is linked to the glycans of the GPI anchor. GPI- attachment in glycoproteins is known as the field of
anchored glycoproteins, which may also contain cova- ▶glycoproteomics.
lently attached N- and/or O-glycans at other residues
within the polypeptide, are usually anchored to plasma Characteristics
membranes of cells by insertion of the acyl chains of the Glycans on glycoproteins can vary in the types of
GPI moiety into the membrane outer leaflet. Thus, sugars that are attached, the ▶anomeric configuration
glycoproteins are found in many different sizes, ranging and structure of the attached sugars, the numbers of
from several thousand Daltons to millions of Daltons. attachments and the sites of attachments (2). The
The presence of carbohydrate on protein is a type of linkage of one monosaccharide to another is typically
▶post-translational modification, which along with via a ▶glycosidic linkage, characterized by the acetal
phosphorylation constitutes one of the most common structure. The part of the glycan linked to protein is
types of such modifications. The majority of the nearly termed the reducing end and the opposite, unattached G
30,000 proteins expressed in human cells are glyco- end(s) of the glycan is termed the non-reducing end or
proteins (1). The carbohydrates of glycoproteins are terminal region. Although there are many types of
added enzymatically to specific sites on a protein. This sugar-protein linkages, which are also glycosides, most
contrasts with what is seen for ▶glycated proteins in sugar-protein linkages are of two types, N-glycosides,
which carbohydrate addition occurs through the in which the amide of Asn (and in some organisms Arg)
chemical or non-enzymatic addition of a free mono- forms the linkage group (-C-N-C-) and O-glycosides,
saccharide, usually glucose or galactose, to amino in which the hydroxy group of hydroxyamino acids,
groups in proteins through formation of a Schiff base such as Ser, Thr, Tyr, hydroxylysine (Hyl) and
and rearrangement to a stable oxoamine adduct known hydroxyproline (Hyp), forms the linkage group (-C-
as an Amadori product. This process is termed O-C-). C-glycosides are an exception to this general-
glycation and is often seen in patients with diabetes ization, here a -C-N-C- bond links the sugar to an
and children with galactosemia. Animals, plants, fungi, amino acid, as seen in Man-C-Trp. Acetal or glycosidic
Protoctista, archaea and bacteria synthesize glycopro- linkages between sugars are stable to alkali. By
teins and many animal viruses contain glycoproteins. In contrast, the linkage of sugar to protein via Asn, Ser
animals, most of the proteins that are on cell surfaces or Thr residues is labile to relatively mild alkali. The
and those that are secreted by cells are glycosylated. sugar-protein linkages via Tyr, Hyl and Hyp are
Glycoproteins in membranes occur as integral or resistant to alkali. The glycosidic linkages within a
intrinsic membrane glycoproteins and may contain glycan and all glycosidic linkages of sugars to proteins
one or more transmembrane domains. GPI-anchored can be hydrolyzed by treatment with strong acid. An
glycoproteins are also considered to be integral exception to this generalization is the C-glycoside
membrane proteins. Glycoproteins can also occur as linkage, which cannot be hydrolyzed by treatment with
extrinsic membrane glycoproteins that associate with either alkali or acid.
the membrane through other mechanisms. Many Table 1 lists some of the common sugar-protein
proteins within the cytosol of eukaryotes are also linkages found in glycoproteins from animals, plants,
glycoproteins. In animal cells two of the major classes fungi, archaea and bacteria. Many dozens of linkages
of intracellular glycoproteins are the storage polysac- are now known, but mammalian cells appear to
charides and those that contain O-linked GlcNAc or generate about a dozen or so. In most cases the
O-GlcNAc (termed O-GlcNAcylated) modifications. monosaccharide residues shown in Table 1 are
The attached glycans in different glycoprotein species extended by the addition of other sugar residues. For
often exhibit tremendous diversity in size and structure. examples see the composite animal cell glycoproteins
Within a single glycoprotein species these structural shown in Fig. 1. The typical mammalian glycoprotein
differences are often denoted by the term microheter- may contain N- and/or O-glycans of the types shown.
ogeneity and the varied forms of a single glycoprotein One exception to the generalization that glycans are
species are termed ▶glycoforms. Different glycopro- extended by addition of other sugars is found in
teins from the same cell may be glycosylated very eukaryotic cytosolic proteins that contain O-linked
differently, depending on the primary, secondary, GlcNAc (GlcNAcβ-O-Ser/Thr), where this residue is
tertiary and/or quaternary structure, association with not further modified. It is also one of the few examples
other proteins and subcellular localization. However, where the sugar addition is reversible, i.e. the GlcNAc
glycosylation differences are greater among glycopro- residue is removed and added back multiple times on a
teins from different cell types within an organism; the mature protein. This probably serves an important
706 Glycosylation of Proteins
*
Abbreviations: GlcNAc, N-acetylglucosamine; Asn, Asparagine; GalNAc, N-acetylgalactosamine; Ser, Serine; Thr, Threonine;
Man, Mannose; Xyl, Xylose; Fuc, Fucose; Glc, Glucose; Tyr, Tyrosine; Gal, Galactose; Hyl, Hydroxylysine; Trp, Tryptophan; Hyp,
Hydroxyproline; Ara, Arabinose; Arg, Arginine; Rha, Rhamnose; Pro, Proline; Cys, Cysteine; Gly, Glycine; Xaa, any amino acid,
except as indicated
regulatory function for cytosolic glycoproteins, akin glycosylation is found in glucosylation of N-glycans
to the action of protein phosphorylation, which is as part of the ▶quality control system for glycoprotein
also reversible. (The other example of reversible folding, as discussed below.)
Glycosylation of Proteins 707
Glycosylation of Proteins. Figure 1 Examples of different types of protein glycosylation. Shown are examples
of composite membrane glycoproteins in animals that may contain one or more O- or N-glycans and a GPI-anchor
or a transmembrane domain. Glycoproteins in the cytosol may also contain O-GlcNAc residues. The key
for the symbols and abbreviations of monosaccharides is indicated and used in other figures.
N-glycans in higher animals typically contain 7–20 but not plant, glycoproteins. Some bacteria synthesize
monosaccharide residues, whereas O-glycans typically sialic acid, however it has not yet been identified on
have 2–10 residues. However, in yeast, many N- bacterial glycoproteins. Some glycoproteins from
glycans contain mannan, a polysaccharide of mannose, nematodes contain tyvelose (3,6-dideoxy-D-arabino-
which can contain hundreds of mannose residues. In hexose - Tyv), which is not found in vertebrate
proteoglycans the attached glycosaminoglycans can be glycoproteins. Some of these monosaccharide residues
hundreds of residues in length. Some examples of may themselves be further modified before or after
common types of N- and O-glycans in mammals are incorporation into the glycan moiety of the glycopro-
shown in Fig. 2. Mammalian glycoproteins are largely tein to provide even more diversity of structure, as seen
composed of the ten sugars or building blocks shown, for NeuAc, which may be O- and/or N-acetylated at
which are the ▶hexoses galactose (Gal), glucose (Glc) various positions, GlcA and IdoA, which can be N-
and mannose (Man), the ▶deoxyhexose fucose (Fuc), sulfated and O-sulfated at various positions, and
the ▶hexosamines N-acetylglucosamine (GlcNAc) and GlcNAc and Gal, which may be O-sulfated. Thus, the
N-acetylgalactosamine (GalNAc), the ▶uronic acids number of possible glycan structures is astronomically
glucuronic acid (GlcA) and iduronic acid (IdoA), the large and the upper limit of the number of structures is
pentose xylose (Xyl) and the 9-carbon carboxylated not known for any organism. It is important to note that
amino sugar ▶sialic acid (Sia) and its multiple polypeptides are typified by a linear structure in which
derivatives. In humans Sia occurs primarily as N- two amino acids (L-amino acids in animals) are linked
acetylneuraminic acid (NeuAc). Other organisms use by a peptide bond. By contrast, glycans in glycopro-
many of these same monosaccharide residues, but also teins are usually branched structures, where the
use novel ones not found in animals. For example, both monosaccharides are linked to each in multiple ways,
Gal and GlcNAc are commonly found in glycoproteins such as different anomeric configuration (α versus β) to
from all the known kingdoms. Rhamnose (Rha) and various positions on a residue (position C-2, C-3, C-6,
arabinose (Ara) are found in plant, but not animal, etc.). In addition, the sugars in glycoproteins can be
glycoproteins, whereas sialic acids are found in animal, in either pyranose (6-membered ring) or furanose
708 Glycosylation of Proteins
Glycosylation of Proteins. Figure 2 Examples of different types of N- and O-glycans. Animal cell N-glycans shown
on the left side have a common pentasaccharide core (shown in the boxed structure), which is composed of a
trimannosyl sequence linked to a chitobiosyl disaccharide, which is in turn linked to an Asn residue through N-
glycosidic linkage. The N-glycans are generally classified as high mannose-, hybrid- or complex-type as shown. The
complex-type N-glycans may have multiple branches or antennae, described as mono-, bi-, tri-, or tetra-antennary,
etc. Animal cell mucin-type O-glycans shown on the right side have a common structure of GalNAc linked to either
Ser or Thr. This GalNAc residue may be modified in various ways to generate a variety of core structures (shown in
the boxed structures).
(5-membered ring) structures and the residues may be The attachment of N-glycans to Asn residues of
either or D- or L-enantiomers (mirror images). Con- secreted and membrane-bound glycoproteins occurs
sidering all these possibilities, it is easy to see that two within a ▶consensus sequence or sequon –Asn-X-Ser/
identical amino acids linked together in a protein give a Thr- (or Cys) (Table 2), although not all Asn residues
single dipeptide structure, whereas two identical within the sequon of such glycoproteins are always
hexoses may be linked together to give 64 possible used. Asn residues outside this sequon are not N-
isomeric disaccharide structures. If two different glycosylated. In addition, cytoplasmic proteins with the
hexoses are linked together it is possible to obtain N-glycosylation sequon are not N-glycosylated, be-
128 different isomers (3). cause the pathway of N-glycosylation occurs within the
The N-glycans, also called Asn- or ▶N-linked lumen of the ▶endoplasmic reticulum (ER), as
oligosaccharides, in animals, plants, fungi and protista discussed below. Although the N-glycosylation sequon
contain the common trimannosyl core structure that is is the most well known glycosylation sequon, a few
linked via a chitobiosyl core (-GlcNAc-GlcNAc-) to other sugar-amino acid linkages also occur in definable
Asn, as highlighted in Fig. 2. There are various types of sequences of proteins, as seen for addition of O-Glc, O-
N-glycans in animal cells, distinguished by the outer or Fuc, C-Man and O-Gal (collagen) (Table 2). For most
terminal sugar structure, as seen for ▶high mannose- other attachments of sugars to proteins however, there
type, hybrid-type and ▶complex-type sequences. The are no clearly predictable consensus sequences,
O-glycans in mucins of animal cells, which are also although the probability of attachment of some sugars
called Ser/Thr- or O-linked oligosaccharides, are to protein, such as GalNAc or GlcNAc to Ser/Thr
characterized by the linkage to Ser/Thr residues via residues in mucins and in cytosolic glycoproteins,
GalNAc, as shown in Fig. 2. The GalNAc residue appears to be enhanced by clusters of Ser and/or Thr
may be modified in different ways by linkage to other residues and nearby amino acids (e.g. Pro) (Table 2).
sugars to give various core structures. Altogether there Some mathematical algorithms have been deve-
are at least 8 different core structures in mucin-type loped based on this information to predict sites of
O-glycans of animals. Some of the more common ones addition of GalNAc to Ser/Thr residues in animal
are highlighted in Fig. 2 as cores 1–4. mucins.
Glycosylation of Proteins 709
Glycosylation of Proteins. Table 2 Some Protein Consensus Sequences for Sugar Addition
*
Sugar-Protein Linkage Consensus Sequence
GlcNAcβ-N-Asn -Asn-Xaa-Ser/Thr- (where Xaa ≠ Pro)#
-Asn-Xaa-Cys- (where Xaa ≠ Pro) rare (animals, plants, fungi, Protoctista)
Fucα-O-Ser/Thr -Cys-Xaa-Xaa-Gly-Gly-Ser/Thr-Cys- (animals)
Glcβ-O-Ser/Thr -Cys-Xaa-Ser-Xaa-Pro-Cys- (animals)
Galβ-Hyl -Gly-Xaa-Hyl-Gly- (animal collagen)
Man-C-Trp -Trp-Xaa-Xaa-Trp- (animals)
Xylβ-O-Ser -Ser-Gly- (or Ala) (indefinite) (animals)
GalNAcα-O-Ser/Thr clustered Ser/Thr near Pro (indefinite) (animals)
Glycosylation of Proteins. Figure 4 Biosynthesis of N-glycans in glycoproteins in higher animals. The pathway
shown occurs in the lumenal regions of the secretory organelles the ER and the Golgi apparatus. The latter is
generally partitioned into separate regions known as cis, medial, and trans. Lysosomal acid hydrolases may acquire
Man-6-P residues on their N-glycans, which are recognized by the Man-6-P receptors, which can deliver the
lysosomal enzymes to late endosomes. Following the completion of N-glycan structures, glycoproteins may stay in
the secretory organelles and endosomes as either soluble or membrane-bound glycoproteins, be secreted into the
extracellular space or be incorporated into plasma membrane.
the undesirable protein oligomerization that can occur in generally defined as non-immune proteins that recog-
the concentrated protein environment of the ER. nize and bind to specific glycan structures without
Glycoproteins that are partly or improperly folded are catalyzing a chemical modification. Calnexin/calreti-
transported back into the cytoplasm by a process culin bind in a reversible manner and their binding is
termed retrograde transport, which appears to occur probably associated with binding of other chaperones
through translocon Sec61p. UGGT binds its acceptor that recognize specific peptide features of the protein.
substrate Man9GlcNAc2-Asn with high affinity only on Once released from calnexin/calreticulin, the Glc1-
partly unfolded proteins, and thus has the potential for Man9GlcNAc2-Asn is subject to action of α-glucosi-
dual recognition of protein and glycan determinants. dase II, resulting in reformation of Man9GlcNAc2-Asn.
The Glc1Man9GlcNAc2-Asn generated by this enzyme If a glycoprotein is still not properly folded, the UGGT
is recognized by two different ER lumenal molecular adds back Glc to regenerate Glc1Man9GlcNAc2-Asn.
chaperones, ▶calnexin and ▶calreticulin. Calnexin and UGGT participates in quality control of N-glycan
calreticulin are examples of animal lectins, which are biosynthesis and protein folding through this cycle of
712 Glycosylation of Proteins
glucosylation/deglucosylation, which is repeated until Lys residues. The phosphotransferase recognizes the
a glycoprotein assumes a conformation that blocks its signal patch and adds GlcNAc-1-P from the donor
interaction with UGGT. Blocking the action of α- UDPGlcNAc to nearby Man residues on the high
glucosidase with castanospermine or other glucosidase mannose-type N-glycans to generate the phosphodie-
inhibitors can result in glycoprotein accumulation in ster GlcNAc-1-P-6-Man5-8GlcNAc2-Asn. Following
the ER, due to inefficient protein folding. This single formation of GlcNAc-1-P-6-Man5-8GlcNAc2-Asn on
Glc residue serves as a type of ER-retention signal, lysosomal enzymes, they are subjected to the action
preventing glycoprotein exit from the ER. of the α-N-acetylglucosamine-1-P phosphodiesterase
Following the formation of Man9GlcNAc2-Asn on a (“uncovering” enzyme or UCE), which removes the α-
folded glycoprotein, the ER α-mannosidase removes linked GlcNAc residue, resulting in formation of the
one of the mannose residues of Man9GlcNAc2-Asn to phosphomonoester structure P-6-Man5-8GlcNAc2-Asn.
form Man8GlcNAc2-Asn (Fig. 4). Interestingly, some Thus, lysosomal enzymes acquire one or more Man-6-
cells contain an endomannosidase that can remove Glc- phosphate- (Man-6-P) phosphomonoester residues on
Man disaccharide from Glc1Man9GlcNAc2-Asn in an high mannose-type N-glycans. The presence of Man-6-
α-glucosidase II-independent pathway to form an P blocks action of α-mannosidases on the specifically
alternative Man8GlcNAc2-Asn. The endomannosidase phosphorylated Man residues. Thus, as described
can also act on Glc1-3Man9GlcNAc2-Asn derivatives. below, phosphorylated glycans cannot be converted
Following formation of Man8GlcNAc2-Asn on ER to complex-type N-glycans, although they can be
glycoproteins, they usually exit to the Golgi appar- converted to Man-6-P-containing hybrid-type N-gly-
atus by vesicular transport involving COP-coated cans.
vesicles. The Golgi apparatus is recognized as a multi- It is important to note that sugar nucleotides, which are
compartment organelle, with cis, medial and trans important donors for glycosyltransferases in the lumen
compartments and a terminal compartment called the of the ER and Golgi apparatus, are synthesized in the
transGolgi network (TGN). The Golgi apparatus is cytosol. The sugar nucleotides are imported into the
usually positioned in the cell so that the cis-Golgi lumen of the ER and Golgi apparatus by specific
is nearest or proximal to the ER and the trans-Golgi is transporters. Most of these transporters function as
away from or distal to the ER. antiporters; they move a sugar nucleotide into the
Upon reaching the cis-Golgi the Man8GlcNAc2-Asn lumen and the cognate nucleoside monophosphate into
in glycoproteins is subjected to further processing by the cytosol for reutilization. Defects in transporter
α-mannosidase I (Fig. 4). This enzyme removes 3 function are associated with human genetic diseases, as
additional α-linked mannose residues from Man8- discussed below.
GlcNAc2-Asn to generate Man5GlcNAc2-Asn. Not all The recognition of lysosomal hydrolases by the
the high-mannose N-glycans, however, are susceptible phosphotransferase is dependent on the folded structure
to α-mannosidase I. The Man8GlcNAc2-Asn in some of lysosomal enzymes and unfolded proteins are not
glycoproteins are not very accessible to α-mannosidase recognized by the phosphotransferase (6). As discussed
I, leading to formation of mature glycoproteins having below, mutations in the phosphotransferase can result
Man8GlcNAc2-Asn or partly processed forms such as in lack of addition of GlcNAc-1-P to the more than 50
Man7-, Man6-, or Man5GlcNAc2-Asn. The action of α- lysosomal enzymes. Alternatively, the phosphotrans-
mannosidase-I can be inhibited by several drugs, ferase can stochastically fail to add the GlcNAc-1-P to
including the plant alkaloid kifunensine and the a fraction of the lysosomal acid hydrolases. Such non-
mannose derivative 1-deoxymannojirimycin. phosphorylated glycans of lysosomal acid hydrolases
Acid hydrolases that are destined to enter lysosomes are subject to further processing and modification in the
are subject to the action of an alternative pathway in Golgi apparatus and can acquire complex-type N-
which they acquire phosphorylated mannose residues glycan structures that contain sialic acid and other
(Man-6-P) that are recognized by the mannose-6- terminal sugars.
phosphate (Man-6-P) receptors (Fig. 4). These recep- The Man5GlcNAc2-Asn structures are potential accep-
tors help to deliver lysosomal enzymes to endosomes, tors for the enzyme N-acetylglucosaminyltransferase
from which the ▶lysosomal acid hydrolases can enter I (GNT-I), which adds GlcNAc from the donor
mature lysosomes and the Man-6-P receptors recycle to UDPGlcNAc to form GlcNAcMan5GlcNAc2-Asn
the Golgi for additional rounds of delivery (6). The (Fig. 4). This is a hybrid-type N-glycan, that contains
high mannose-type N-glycans of lysosomal acid non-reducing terminal Man residues and other terminal
hydrolases in the early Golgi apparatus are recognized sugars, such as GlcNAc. In vertebrate cells the product
by the UDPGlcNAc:lysosomal enzyme phosphotrans- GlcNAcMan5GlcNAc2-Asn is usually acted upon by
ferase. Many of the lysosomal acid hydrolases have α-mannosidase II, which specifically recognizes
unique 3-dimensional structures that generate a surface GlcNAcMan5GlcNAc2-Asn and removes two mannose
patch, which is a basic region that includes residues to form GlcNAcMan3GlcNAc2-Asn. The
Glycosylation of Proteins 713
concerted action of α-mannosidases and GNT-I, which and the availability of the glycoprotein glycans to the
appears to occur largely in the cis-Golgi region, enzymes. Following these modifications of the N-
generates the trimannosyl structure common to all glycans in the Golgi apparatus, secretory glycoproteins
complex-type N-glycans (7). This GlcNAcMan3- are released into the extracellular space by secretory
GlcNAc2-Asn is usually acted upon by N-acetylgluco- vesicles, while membrane-bound glycoproteins may be
saminyltransferase II, which adds GlcNAc from targeted to the plasma membrane or lysosomes.
UDPGlcNAc to form the product GlcNAc2Man3-
GlcNAc2-Asn. This is an example of a biantennary GPI-Anchor Biosynthesis
complex-type N-glycan, which is characterized by the Many glycoproteins in eukaryotes contain a novel C-
lack of non-reducing terminal Man residues and the terminal modification of a glycosylphosphatidylinosi-
presence of other terminal sugars. The biantennary tol lipid anchor, the GPI anchor (9). The addition of the
nature refers to the presence of two branches of the GPI-anchor to proteins occurs in the ER and involves
complex-type N-glycan. But within the cis-Golgi recognition of a C-terminal domain of newly synthe-
additional N-acetylglucosaminyltransferases (GNT-III sized protein (Fig. 5). A preformed lipid-linked
through VI) can add additional GlcNAc residues to the precursor is generated from phosphatidylinositol by
Man residues to form bisected N-glycans, or multi- initial reactions in the cytosolic face of the ER. An G
antennary N-glycans, such as tri-, tetra-, penta- and intermediate containing glucosamine is then reoriented
hexa-antennary structures. Within the more distal (“flip-flop”) by translocation across the ER membrane
trans-Golgi apparatus, the GlcNAc2Man3GlcNAc2- to allow further elongation of the precursor by addition
Asn is subjected to modification by galactosyltrans- of Man residues from dol-P-Man. Ethanolamine
ferases, causing addition of Gal residues to GlcNAc phosphate is added to Man residues by donation from
residues from the donor UDPGal to form Galβ4Glc- phosphatidylethanolamine to generate a GPI precursor.
NAc-R sequences. This disaccharide terminal sequence In all organisms this GPI precursor is characterized by
is termed N-acetyllactosamine (LN). In vertebrates the having glucosamine linked to inositol and the triman-
pituitary glycoprotein hormones, such as lactating nosyl sequence linked to ethanolamine in a “core”
hormone and follicle stimulating hormone, acquire structure. But this core structure may be differentially
biantennary N-glycans but are subject to addition of modified in a tremendous variety of ways, depending
GalNAc residues from the donor UDPGalNAc to form on the organism, by addition of other sugars, e.g. Man
the sequence GalNAcβ4GlcNAc-R; this terminal dis- and Gal residues, additional ethanolamine residues,
accharide is termed lactosamine-di-N-acetyl (Lacdi- addition of phosphate, fatty acylation of the sugars and
NAc or LDN). This formation of LDN sequences on further acylation of inositol.
pituitary glycoprotein hormones requires the action The GPI precursor is the substrate for a transamidation
of a specific N-acetylgalactosaminyltransferase that reaction by the GPI transamidase complex. This
appears to recognize primary sequences within the enzyme complex recognizes the C-terminal lipophilic
hormones, and does not generally act on other portion of some proteins with a GPI anchor sequence,
glycoproteins within the pituitary (8). But less specific causing the cleavage of the polypeptide bond and
N-acetylgalactosaminyltransferases are also expressed transfer of the GPI precursor to the new C-terminal
in other cells to generate the LDN structure on non- amino acid. The GPI transamidase forms a carbonyl
pituitary glycoprotein hormones. The LDN termini intermediate with the substrate protein. The signal
of pituitary glycoprotein hormones are sulfated at the sequence for GPI anchor addition is a C-terminal
C-3 position of the terminal GalNAc residues by a region with an amino acid to which the anchor is
PAPS:GalNAc 3-O-sulfotransferase to form S-3-GalNAc eventually attached that is termed the ω site. The amino
moieties. The resultant formation of (S-3-GalNAc)2 acids that are two residues to the carboxyl side of ω
GlcNAc2Man3GlcNAc2-Asn on pituitary glycoprotein residue (the ω + 2 site) have small side chains, whereas
hormones promotes their recognition and clearance the residues at the ω + 1 site can have large side chains.
from the blood circulation by a liver receptor for In all cases the ω + 2 site is followed by a stretch of
S-3-GalNAc. 5–10 hydrophilic amino acids and then 15–20 hydro-
In most glycoproteins following the addition of Gal phobic residues at or very near the carboxyl or
residues to GlcNAc residues, the complex-type N- C-terminus of the protein. Following the addition of
glycans can acquire other modifications in the trans- the GPI anchor, the GPI-anchored glycoproteins move
Golgi and TGN. These include addition of sialic acid to the plasma membrane. These glycoproteins usually
from CMPNeuAc, fucose from GDPFuc, and other have other sugar residues attached to other amino acids,
residues. Each addition or modification is catalyzed by such as N-glycans, that may or may not be processed
a separate enzyme. A tremendous variety of modifica- within the ER and Golgi apparatus. It is interesting that
tions are possible depending on a wide variety of the formation of N-glycans and GPI-anchored glyco-
factors, such as expression of the modifying enzymes proteins uses a common intermediate, i.e. dol-P-Man.
714 Glycosylation of Proteins
Glycosylation of Proteins. Figure 5 Biosynthesis of GPI-anchored glycoproteins. The pathway shown is for
human GPI anchor biosynthesis from phosphatidylinositol, which occurs in the cytosolic and lumenal regions of the
ER. Following generating of the GPI anchor precursor, the GPI anchor is added en bloc to the C-terminal region of an
ER protein by the transamidase complex, resulting in the cleavage and release of a C-terminal peptide.
Glycosylation of Proteins 715
Some individuals have a mutation in the gene (termed N-acetylglucosaminyltransferase (the core 2 GlcNAcT)
PIG A) encoding the first enzyme of the pathway for to generate the trisaccharide Galβ3(GlcNAcβ6)GalNA-
GPI anchor biosynthesis that normally adds GlcNAc cα1-Ser/Thr (core 2 O-glycan) from the donor UDPGlc-
from UDPGlcNAc to phosphatidylinositol. Thus, these NAc. This core 2 O-glycan can be subsequently modified
individuals are defective in generating the mature GPI by addition of other sugars, such as galactose, fucose and
anchor precursor and are deficient in generating GPI- N-acetylneuraminic acid, and/or sulfate residues on
anchored glycoproteins. Such individuals are often selected sugars to generate a wide variety of O-glycan
clinically recognized as having paroxysmal nocturnal structures.
hemoglobinuria (PNH), a form of hemolytic anemia.
GPI-anchored glycoproteins also occur in many Biosynthesis of Other O-Glycans
protozoans, and have been especially well character- The biosynthesis of non-mucin type O-glycans is
ized in African trypanosomes, where the GPI-anchored incredibly varied depending on the cellular compart-
glycoprotein is recognized as a highly antigenic variant ment. O-GlcNAc residues are added to proteins in the
surface glycoprotein (VSG). cytoplasm, as discussed below. Glycosaminoglycan
addition is initiated by UDPXyl:core protein β-D-
Mucin-Type O-glycan Biosynthesis xylosyltransferases I and II, which transfer Xyl from G
Many glycoproteins within the Golgi apparatus are UDPXyl to specific Ser residues in proteoglycan
modified to contain GalNAcα1-Ser/Thr residues, core proteins in the ER. The Xyl residue is subse-
typically found in animal mucins, by the action of a quently modified by addition of Gal and GlcA residues
family of UDPGalNAc:polypeptide α-N-acetylgalacto- by galactosyltransferases and glucuronyltransferases
saminyltransferases (ppGalNAcTs). While mucins may respectively, to form the core linkage tetrasaccharide of
contain hundreds of such linkages, some glycoproteins, glycosaminoglycans, GlcAβ1-3Galβ1-3Galβ1-4Xylβ1-
such as the transferrin receptor, contain only a single O- Ser, which occurs in all proteoglycans. This synthesis
glycan. Yet, all such linkages are categorized as mucin- of the glycosaminoglycan core region may be com-
type. The ppGalNAcTs recognize Ser and Thr residues pleted in the ER, while the subsequent elongation of the
in glycoproteins and add GalNAc in O-glycosidic glycans and sulfation and epimerization, which is an
linkage from the donor UDPGalNAc to these amino orchestrated and incredibly complex series of reactions,
acid side chains to form GalNAcα1-Ser/Thr, which is appear to occur primarily in the Golgi apparatus. The
also called the Tn antigen (10). Well over a dozen elongation of glycosaminoglycans on proteoglycans
different ppGalNAcTs are known and many of these can be partly averted by feeding cells β-xylosides, that
are expressed simultaneously within cells. Some of act as acceptors for addition of Gal, thus effectively
these enzymes may have unique, but partly over- decreasing elongation of glycosaminoglycan within the
lapping, recognition of Ser/Thr residues within the proteoglycan acceptors. Remarkably, β-xylosides ap-
polypeptide sequence. Interestingly, many ppGalNAcTs pear capable of penetrating the ER and possibly the
are dual function enzymes containing a catalytic Golgi apparatus of cells. This inhibition by competition
domain that transfers GalNAc and a lectin domain can result in synthesis of free glycosaminoglycans on
(ricin- or R-type) that binds to GalNAc residues. the β-xyloside and reduced addition of glycosamino-
Thus, addition of GalNAc to some Ser/Thr sites may glycan to proteoglycans. O-Mannosylation of proteins
promote further modification by attracting more in yeast in initiated in the ER by transfer of Man from
ppGalNAcTs. Such concerted actions of ppGalNAcTs the donor dol-P-Man using a specific O-mannosyl-
in the Golgi apparatus may promote the relatively transferase. Further elongation to generate mannose-
efficient modifications of hundreds of Ser/Thr containing polysaccharides in yeast occurs by Man
residues within some very large mucin polypeptides, donation from GDPMan in the Golgi apparatus by
some of which have over 10,000 amino acids. additional mannosyltransferases. An equivalent en-
Following the formation of GalNAcα1-Ser/Thr residues, zyme in animals, termed POMT1, may initiate O-
glycoproteins are subjected to the action of a β3- Man formation on selective Ser/Thr residues in
galactosyltransferase, also called the T-synthase, to form glycoproteins in the ER using dol-P-Man as the donor,
the disaccharide Galβ3GalNAcα1-Ser/Thr, which is while further elongation and addition of other sugars
called the Thomsen-Friedenrich, TF or simply T antigen, may occur in the Golgi apparatus. O-fucosylation and
using the donor UDPGal. The T antigen disaccharide is O-glucosylation of EGF-like domains on glycoproteins
also the simplest core 1 O-glycan structure (Fig. 2). are catalyzed by specific enzymes that transfer Fuc or
However, occasionally the GalNAcα1-Ser/Thr residues Glc from GDPFuc or UDPGlc, respectively, in the
may be sialylated to generate the disaccharide NeuA- Golgi apparatus. Collagen is glycosylated in the ER
cα6GalNAcα1-Ser/Thr (sialyl Tn antigen), which cannot following hydroxylation of Lys residues to generate
be further modified. Upon formation of Galβ3GalNA- hydroxylysine (Hyl). The addition of Gal to Hyl is
cα1-Ser/Thr, the core 1 structure may be modified by an catalyzed by a collagen-specific enzyme UDPGal:
716 Glycosylation of Proteins
procollagen-5-hydroxy-L-lysine D-galactosyltransfer- plant wounding and pathogen attack. The pistil and
ase, which adds Gal to Hyl residues on procollagen in pollen tube extracellular matrix are enriched in these
the ER during procollagen biosynthesis and concomi- highly glycosylated proteins.
tantly with Hyl formation on nascent polypeptides
catalyzed by lysyl hydroxylase activity. Bacterial Glycoproteins
Glycoproteins are also found in prokaryotes and in
Glycosylation in the Cytosol archaebacteria, although the general structures of the
Many cytosolic proteins in animals (and probably attached glycans and sugar residues are very different
plants) contain one or more residues of β-linked from those found in animals and plants. Among the
GlcNAc in O-glycosidic linkage to Ser/Thr residues best studied prokaryotic glycoproteins are the cell
(11). These O-GlcNAcylated proteins (O-GlcNAc- surface or S-layer glycoproteins (12). Such S-layer
containing glycoproteins) are generated by the action glycoproteins can assemble into ordered lattice-like
of the UDPGlcNAc:polypeptide O-acetylglucosami- structures on the cell surface. Each S-layer glycoprotein
nyltransferase (O-GlcNAc transferase), which transfers may contain more than one attached glycan, which can
GlcNAc from UDPGlcNAc to selected Ser/Thr resi- be linked via Asn or other amino acid residues (Table
dues of cytosolic proteins. Some of the more pro- 1). In many cases the bacterial S-layer glycan chains are
minent O-GlcNAcylated glycoproteins include RNA linear or branched homo- or hetero-saccharides having
polymerase II, c-myc and the estrogen receptor. 20–50 identical repeating units. By contrast, archaeal
O-GlcNAcylation is one of the only types of S-layer glycoproteins have shorter glycans, generally
glycosylation that is reversible. The O-GlcNAc may lacking repeating units. Although the exact mechan-
be selectively removed by the action of an O-GlcNAc isms of S-layer glycoprotein biosynthesis are not yet
specific acetylglucosaminidase (O-GlcNAcase) in the defined, it appears that most sugar residue addition
cytosol. This alternating addition and removal of occurs in the outer membrane following protein
O-GlcNAc by these two enzymes is akin to reversible translocation.
phosphorylation and dephosphorylation of cytosolic
proteins. O-GlcNAcylation may serve to regulate many Many Factors Regulate Protein Glycosylation
metabolic pathways and is required for animal and As discussed above, two of the major factors regulating
plant cell growth. protein glycosylation are the sequence motifs within the
The storage polysaccharide glycogen, which is a primary structure of glycoproteins and the site of
glycoprotein in animals, is generated on the core biosynthesis. But many other factors also contribute to
protein glycogenin within the cytosol of animals, by its regulation of protein glycosylation. These include the
autocatalytic “self-glucosylation” of a Tyr residue at expression of glycosyltransferases, expression of gly-
position 194 using UDPGlc as a donor. The Glc-O-Tyr cosidases, secondary, tertiary and/or quaternary struc-
is then elongated by addition of other Glc residues tures of proteins, availability of donor substrates, e.g.
(up to 10) from UDPGlc by glycogenin activity. The sugar nucleotides and dolichol, cations, e.g. magnesium
Glc-containing oligosaccharide on glycogenin is then and manganese, temperature and membrane lipid
elongated by glycogen synthase. A similar type of composition and structure. Many of these factors,
activity may occur on the starch protein amylogenin. especially expression of glycosyltransferases, vary
tremendously between cell types. Dozens of different
Plant Glycoproteins glycosyltransferase genes encoding enzymes that act on
Many plant glycoproteins contain N-glycans, which are glycoproteins exist in the genomes of most multi-
also synthesized via the dolichol pathway in the ER. cellular organisms. Together, these many factors help to
They can also be subsequently modified by processing explain the huge differences in glycosylation observed
reactions and addition of other sugars to generate high between different cells and tissues.
mannose-, hybrid- and complex-type N-glycans. Many
plant wall proteins are typically glycoproteins rich in Glycoproteins Have Many Biological Functions
the amino acids hydroxyproline (▶hydroxyproline-rich Because glycoproteins are so common in all cells, it is
glycoprotein, HRGP), proline (proline-rich protein, not surprising that the glycan moieties have many
PRP), and glycine (glycine-rich protein, GRP). The O- different functions. Although many of the specific
glycans in HRGPs may account for up to 95% of the functions of glycoproteins are being defined, it is likely
glycoprotein weight and the glycans can range in size that the complete picture of glycoprotein functions will
from a single attached Ara residue to large ▶arabino- take many years to complete. Some of the known
galactans containing nearly 100 residues of Ara and functions of glycoproteins and their attached glycans
Gal. Many of these glycoproteins form rods (HRGP, include cell-cell adhesion, cell-matrix interactions,
PRP) or β-pleated sheets (GRP). Extensin is one of the glycoprotein targeting to organelles and cell signaling.
best-studied HRGPs. HRGP expression is increased by For example, glycoproteins regulate many different
Glycosylation of Proteins 717
types of cell adhesion, including sperm-egg adhesion, N-glycosylation can affect mobility upon isoelectric
leukocyte-platelet-endothelial cells adhesion, recog- focusing chromatography. CDG patients, depending on
nition and phagocytosis and neuronal cell-matrix the altered gene, exhibit a variety of changes in
adhesion. Some of the non-specific functions of physiognomy and suffer from neurological, liver and/
glycoprotein glycans include protein folding and or intestinal problems. Children with CDG, depending
assembly, protein protection and stability against on the type of genetic mutation, exhibit impairments in
proteases, control of the circulatory half-life of cognitive ability, speech and balance and motor skills.
glycoproteins, regulation of protein conformation and Other disorders where altered protein glycosylation is
thermal stability and control of enzyme kinetics. Many observed include several forms of congenital muscular
glycoprotein glycan functions are generated by glycan dystrophy, such as Fukuyama congenital muscular
recognition through carbohydrate-binding proteins or dystrophy, ▶limb-girdle muscular dystrophy, muscle-
lectins. Lectins are made by all organisms, including eye-brain disease and Walker-Warburg syndrome (15).
animal, plants, bacteria and viruses. Many of these diseases are associated with mutations in
genes encoding glycosyltransferases that add GlcNAc or
Human Disorders Associated with Defective Man residues to generate O-linked Man-containing
Protein Glycosylation glycans on α-dystroglycan, which is a membrane- G
There are many human disorders associated with associated glycoprotein that helps to link neuronal cells
an altered ability to add carbohydrate residues to and their cytosolic signaling machinery to extracellular
glycoproteins. One of the first defined examples of this matrix molecules such as laminin. Patients with proger-
was ▶I-cell disease, where patients were found to have a oid-type Ehlers-Danlos (E-D) syndrome have defects in a
recessive genetic mutation of the gene encoding the galactosyltransferase that is required to synthesize the
phosphotransferase activity that helps to generate Man- common linkage region of glycosaminoglycans. Another
6-P residues on lysosomal acid hydrolases. Conse- disorder where altered protein glycosylation is observed
quently, their cells are unable to synthesize lysosomal is leukocyte adhesion deficiency type II (LAD II), where
acid hydrolases with Man-6-P residues efficiently (13). patients lack the ability to add fucose to glycoproteins.
Most of the non-phosphorylated lysosomal acid hydro- This deficiency in fucosylation results in a lack of
lases from these patients become processed within the leukocyte adhesion to selectins, a group of carbohydrate-
Golgi apparatus, acquire sialic acid and other sugar binding proteins that recognize fucose-containing O-
residues on hybrid- and complex-type N-glycans and are glycans and serve to regulate leukocyte trafficking from
secreted into body fluids. The patients accumulate the bloodstream. Some LAD II patients have mutations in
undegraded macromolecules in lysosomes due to lack of the gene encoding the Golgi transporter for GDPFuc, thus
acid hydrolases and these accumulations are recognized preventing normal movement of GDPFuc from its site of
microscopically as inclusion or I-cells, hence the name synthesis in the cytosol into the Golgi apparatus for
I-cell disease. Another historically important defect in utilization by fucosyltransferases. Finally, defects in
glycoprotein glycosylation associated with human glycoprotein glycosylation are also seen in patients with
disease is PNH. Patients with PNH have reduced ability some autoimmune diseases, such as may occur in
to generate the GPI anchor, due to mutation in the congenital dyserythropoietic anemia type II, where a
X-linked PIG A gene. Hemolytic anemia results due to defect in N-glycosylation may occur due to deficiency of
deficiencies in the normally GPI-anchored glycopro- α-mannosidase II activity and in IgA nephropathy, where
teins termed decay accelerating factor (DAF or CD55) a subset of IgA molecules lack appropriate O-glycan
and membrane inhibitor of reactive lysis (MIRL or structures within the hinge region. These are just a few of
CD59), which function to decrease autolysis of the many examples where protein glycosylation is
erythrocytes by activated complement. essential to biological processes.
Many of the genetic defects in the ability to glycosylate ▶Biochemical Engineering of Glycoproteins
proteins are now recognized within the broad category ▶Protein Databases
of ▶congenital disorders in glycosylation (CDGs) (14). ▶Recombinant Protein Production in Mammalian Cell
The CDGs are highly varied depending on the glycan Culture
structures made and are recognized as different types,
such as Type 1a, 1b, 1c, 1d, 1e, and IIa. Each type of
CDG results from mutations in one of the many genes References
encoding proteins involved in N-glycosylation via the 1. Apweiler R, Hermjakob H, Sharon N (1999) On the
dolichol pathway or subsequent processing and frequency of protein glycosylation, as deduced from
analysis of the SWISS-PROT database. Biochim Bio-
glycosylation reactions, or in genes regulating orga- phys Acta 1473:4–8
nelle trafficking and biosynthesis. CDGs are often 2. Spiro RG (2002) Protein glycosylation: nature, distribu-
diagnosed by examining the N-glycosylation pattern of tion, enzymatic formation, and disease implications of
serum glycoproteins, such as transferrin, where altered glycopeptide bonds. Glycobiology 12:43R–56R
718 Glycosylphosphatidylinositol (GPI) Anchors
3. Laine RA (1994) A calculation of all possible oligosac- released in soluble form from the cell surface by the
charide isomers both branched and linear yields 1.05 × 10 action of specific phospholipases.
(12) structures for a reducing hexasaccharide: the Isomer ▶Epithelial Cells
Barrier to development of single-method saccharide
sequencing or synthesis systems. Glycobiology 4:759–767 ▶Glycosylation of Proteina
4. Helenius J, Aebi M (2002) Transmembrane movement of
dolichol linked carbohydrates during N-glycoprotein
biosynthesis in the endoplasmic reticulum. Semin Cell
Dev Biol 13:171–178
5. Parodi AJ (2000) Protein glucosylation and its role in Glycosyltransferase
protein folding. Annu Rev Biochem 69:69–93
6. Kornfeld S (1987) Chromatography: A review of clinical
applications. FASEB J 1:462–468
7. Schachter H (2000) The joys of HexNAc. The synthesis Definition
and function of N- and O-glycan branches. Glycoconj J Glycosyltransferase is a member of a large family of
17:465–483 enzymes expressed in the endoplasmic reticulum and
8. Baenziger JU, Green ED (1988) Pituitary glycoprotein
hormone oligosaccharides: structure, synthesis and Golgi apparatus, which catalyze the transfer of a
function of the asparagine-linked oligosaccharides on monosaccharide unit from a sugar-nucleotide donor,
lutropin, follitropin and thyrotropin. Biochim Biophys typically to the non-reducing terminus of an oligosac-
Acta 947:287–306 charide chain in glycoproteins and glycolipids.
9. McConville MJ, Menon AK (2000) Recent develop- ▶Glycosylation of Proteins
ments in the cell biology and biochemistry of glycosyl- ▶Limb Girdle Muscular Dystrophies
phosphatidylinositol lipids (review). Mol Membr Biol
▶Methylation of Proteins
17:1–16
10. Brockhausen I (1999) Pathways of O-glycan biosynth-
esis in cancer cells. Biochim Biophys Acta 1473:67–95
11. Wells L, Vosseller K, Hart GW (2001) Glycosylation of
nucleocytoplasmic proteins: signal transduction and O-
GlcNAc. Science 291:2376–2378 Glyoxylate
12. Schaffer C, Messner P (2004) Surface-layer glycopro-
teins: an example for the diversity of bacterial glycosyla-
tion with promising impacts on nanobiotechnology.
Glycobiology 14:31R–42R Definition
13. Raas-Rothschild A, Cormier-Daire V, Bao M et al Glyoxylate is a toxic compound, generated in vivo,
(2000) Molecular basis of variant pseudo-hurler poly- which needs to be eliminated by conversion into
dystrophy (mucolipidosis IIIC). J Clin Invest 105:673–681
14. Marquardt T, Denecke J (2003) Congenital disorders of
glycine via the peroxisomal enzyme alanine glyoxylate
glycosylation: review of their molecular bases, clinical aminotransferase.
presentations and specific therapies. Eur J Pediatr ▶Peroxisomal Disorders
162:359–379
15. Endo T, Toda T (2003) Glycosylation in congenital
muscular dystrophies. Biol Pharm Bull 26:1641–1647
Glypidation
Definition
Glypidation describes the attachment of a glycosyl-
Glycosylphosphatidylinositol (GPI) phosphatidylinositol-(GPI)-anchor to certain integral
membrane proteins. The anchor is composed of the
Anchors lipid phophatidylinositol to which a carbohydrate and
an ethanol and phosphate moiety is linked. The GPI-
anchor is attached post-translationally in the lumen of
Definition the endoplasmic reticulum, thereby replacing a tran-
In the lumen of the endoplasmic reticulum, the GPI sient transmembrane region of the modified protein.
anchor is covalently attached to the C terminus of The ▶GPI-anchor attaches proteins to the exoplasmic
proteins destined for the plasma membrane, and the leaflet of membranes, possibly to certain subdomains
transmembrane segment of the protein is cleaved off. such as caveolae and lipid-rafts.
As proteins are only attached to the exofacial leaflet of ▶Fatty Acid Acylation of Proteins
the plasma membrane by the GPI anchor, they can be ▶Glycosylation of Proteins
Gorlin’s Syndrome 719
Definition
Golgi apparatus (Golgi complex) refers to a cytoplasmic
organelle in eukaryotes consisting of stacked, flattened
membrane cisternae, surrounded by vesicles, which is
involved in transport and post-translational modifica-
tion (especially glycosylation) of proteins on their
Gonadotropins
journey through the secretory pathway. The Golgi
complex is also a central sorting station in the secretory
pathway; on the trans- or exit-side of the Golgi complex, Definition
proteins get sorted into several distinct vesicle types for Gonadotropins are pituitary hormones that influence
transport to different final destinations. the functions of the ovary: ▶follicle stimulating
▶Biochemical Engineering of Glycoproteins hormone (FSH) and ▶luteinizing hormone (LH). G
▶Exocytotic Pathway ▶SRY – Sex Reversal
▶Glycosylation of Proteins ▶Hypothalamic and Pituitary Disease, Genetics
▶Limb Girdle Muscular Dystrophies
▶Rho, Rac, Cdc42
▶Vesicular Traffic
Gordon’s Syndrome
Gorlin’s Syndrome
Gonadal Mosaicism
Definition
▶Germline (Gonadal) Mosaicism Gorlin’s syndrome (also known as naevoid basal cell
carcinoma syndrome (NBCCS)) is a rare autosomal
dominant cancer disorder characterised primarily by a
predisposition to several tumours, most commonly
basal cell carcinoma (BCC). In addition to cancer
susceptibility, this syndrome is also associated with a
Gonadotropin Deficiency range of defects resulting from abnormal embryonic
development. Gorlin’s syndrome results from mutation
of the patched gene which functions in the hedgehog
Definition signalling pathway. The developmental defects are
Gonadotropin deficiency describes the absence, de- believed to result from ▶haploinsufficiency, with
creased production or dysfunction of anterior pituitary subsequent mutation of the remaining allele resulting
hormone (LH and/or FSH), which results in a in tumour formation.
decreased or lack of testosterone in males and estrogen ▶Hedgehog Signalling
720 GPCRs
Definition
The GPIIb/IIIa complex is a platelet membrane
glycoprotein complex mediating platelet aggregation
and adhesion to endothelial cells. The complex is an G-Proteins
integrin that recognises the arginine-glycine-aspartic
acid (rgd) sequence present on several adhesive
proteins. The GPIIb/IIIa complex functions as a Definition
receptor for fibrinogen, von Willebrand Factor ▶Diabetes Insipidus, a Water Homeostasis Disease
(vWGF), fibronectin, vitronectin, and thrombospondin. ▶G-Proteins and G-Protein Mutations in Human
Deficiency of GPIIb/IIIa causes ▶Glanzmann’s Diseases
Thrombasthenia. ▶Molecular Motors
G-Proteins and G-Protein Mutations in Human Diseases 721
AC, adenylyl cyclase; CTX, cholera toxin; GAP, GTPase-activating protein; GEF, guanine nucleotide exchange factor; PDE,
phosphodiesterase; PLC, phospholipase C; PTX, pertussis toxin; RGS, regulator of G-protein signaling; VGCC, voltage-gated
calcium channel
cell physiological role of AGS proteins is still fairly and inactivating mutations in GPCRs and G-proteins
obscure. Possibly, these proteins are involved in the are responsible for an increasing number of human
regulation of basic cellular processes like the main- diseases. Functional variability resulting from poly-
tenance of cell polarity and cell division. By acting in morphisms may underlie interindividual differences in
concert with GPCRs they may provide for a signal response to endogenous ligands as well as drugs. At
amplification mechanism and at the same time allow present, the Gαs gene (GNAS1) is the only G-protein
signal transmission via G-proteins independent of gene that has been unequivocally shown to be afflicted
heptahelical membrane receptors. with activating or inactivating mutations that cause
human diseases (8-10).
G-Protein Families and Effectors
Based on the primary amino acid sequence of their α Activating Mutations in Gai
subunits, G-proteins are subdivided into four distinct Mutations in the Gαi2 gene were diagnosed in fixed
families: Gs, Gi, Gq, and G12 (Table 1) (6, 7). Concentra- sections of human ovarian sex chord stromal tumors
tions of Gi proteins in the cell considerably exceed those and adrenal cortical tumors. In a few affected speci-
of other families, and in brain Go may amount to 1–2% mens, the highly conserved Arg179 corresponding to
of total membrane protein. Some G-protein α subunits the aforementioned Arg201 in Gαs in the helical domain
are characterized by a very restricted expression pattern was found to be exchanged for a histidine (Arg179H)
(Table 1), while others like Gs, Gi2, Gq, G12, and G13 giving rise to a Gαi2 protein devoid of any GTPase-
are ubiquitously expressed. G-proteins can also be activity. Constitutively active Gαi2 was subsequently
classified on the basis of the cellular effectors to which referred to as the gip2 oncogene. Most notably, this
they couple. Gs proteins classically stimulate ade- finding could not be confirmed by subsequent studies
nylyl cyclase activity, while Gi proteins inhibit adenylyl on fresh surgically resected tumors, and transgenic
cyclase via their α subunits, but activate inwardly animals expressing the gip2 oncogene in selected
rectifying potassium channels and inhibit P/Q-, N- and tissues have not been reported yet. Therefore, one has
R-type voltage-gated calcium channels via βγ subunits to conclude at this point that the oncogenic potential as
released upon GTP binding to Gαi proteins. Gq proteins well as the frequency of activating mutations of Gαi2
activate phospholipase C-β isoforms, and G12 proteins appear to be rather low.
couple to Rho guanine nucleotide exchange factors
resulting in Rho activation and stress fiber formation Activating Mutations in Gas
(Table 1). Both GTP-loaded α subunits and βγ dimers In many endocrine glands, cAMP stimulates prolifera-
are eo ipso signaling proteins exerting their action tion, differentiation and hormone secretion. A first hint
through activation or inhibition of an ever expanding list to the possible causative contribution of activating
of cellular effector proteins (for Gα effectors see: Table mutations in GNAS1 arose from the identification of a
1). Effectors for G-protein βγ subunits include inwardly subset of growth hormone (GH)-secreting pituitary
rectifying potassium channels (Kir3.1–3.4), G-protein- tumors characterized by high intracellular cAMP
coupled receptor kinases (GRKs), adenylyl cyclases concentrations and increased adenylyl cyclase activity.
(adenylyl cyclases II and IV), phospholipases C (PLC)- These adenomas accounting for approximately 40% of
β1, -β2 and -β3 and phosphatidylinositol 3-kinases GH-secreting tumors were shown to harbor hetero-
(PI3K) β and γ. zygous missense mutations in GNAS1 exons 8 or 9
Considering that hundreds of GPCRs transduce signals giving rise to Arg201Cys/His and Gln227Arg/Lys
by interacting with a limited number of G-proteins, the missense mutations, respectively. Both the highly
question of coupling specificity is worth considering. conserved Arg and Gln residues are essential for GTP
The concept of linear G-protein-mediated signal hydrolysis to occur in the α subunit with the
transduction pathways, i.e. one receptor coupling to requirement of Gln227 to orient and polarize the
one distinct G-protein activating one receptor, appears catalytic water in the transition state. Therefore, these
to be inadequate to describe physiology. G-protein- missense mutations ablate the endogenous GTPase
mediated signal transduction is a complex signaling activity of Gαs and render the α subunit constitutively
network with diverging and converging transduction active, leading to uncontrolled, excessive cAMP
steps at each coupling interface. Deciphering the production in somatotrophs. As cAMP stimulates
mechanism of signal specificity in living cells still proliferation and differentiation in these cells, the
remains a scientific challenge of paramount importance. GTPase-deficient Gαs mutants have been designated
gsp oncogenes.
G-Protein Mutations as the Molecular Basis Gsp mutations are also rarely observed in other
of Human Diseases pituitary tumors like corticotrophs, resulting in in-
Because of their central role in controlling many creased ACTH release (Table 2). Besides, approxi-
physiological functions, naturally occurring activating mately 10% of non-functional pituitary adenomas carry
G-Proteins and G-Protein Mutations in Human Diseases 725
G-Proteins and G-Protein Mutations in Human Diseases. Table 2 Diseases associated with GNAS1 mutations
gsp mutations. Several studies have confirmed the developmental, skeletal and endocrine abnormalities
presence of activating mutations in Gαs in up to 30% of was described to harbor a germline Arg201Leu mutation
toxic thyroid adenomas and in less than 10% of thyroid in GNAS1.
carcinomas. Sporadically, gsp mutations are found in Gαs mutations have also been found in all cases of
parathyroid and adrenocortical tumors as well as in fibrous dysplasia (FD) of the bone. The majority of FD
▶pheochromocytomas. patients has only a single bone defect, a small group
The ▶McCune-Albright syndrome (MAS) is classi- suffers from multiple bone lesions or has other features
cally defined by the clinical triad of cafe´-au-lait of MAS. Missense mutations in GNAS1 identical to
hyperpigmented skin lesions, precocious puberty and those in MAS patients, i.e. Arg201His/Cys, were
polyostotic fibrous dysplasia of the bone. Apart from diagnosed in nearly all forms of FD. A possible
the gonads, other endocrine glands such as the pituitary, explanation for the clinical phenotype relates to the
adrenal cortex and thyroid that are sensitive to trophic general concept that elevated intracellular cAMP levels
cAMP-dependent stimuli were also found to be in osteogenic precursors entail increased proliferation
hyperfunctional in MAS. Nodular and diffuse goiters and decreased differentiation of these cells resulting in
as well as benign thyroid nodules are associated with benign fibrous bone lesions. Activating Gαs mutations
MAS. The sporadic occurrence of thyroid cancer have also been reported to occur in isolated intramus-
(papillary and clear cell thyroid carcinoma) in MAS cular myxomas and those that present in conjunction
patients suggests that additional mutational or epige- with FD (Mazabraud syndrome)
netic events in addition to gain-of-function Gαs
mutations are mandatory for thyroid carcinogenesis in Loss-of-Function Mutations in Gas
these patients. In 1986 the dermatologist Happle More than 60 years ago, Fuller Albright and his
suggested MAS to be caused by a dominant somatic colleagues described several patients presenting with
mutation as an early postzygotic event resulting the short stature, obesity, skeletal abnormalities, mental
mosaic pattern of clinical stigmata. Mutations in retardation and often subcutaneous ossification. This
GNAS1 have been confirmed in affected endocrine syndrome, now collectively called ▶Albright’s heredi-
tissues and in hyperpigmented skin lesions of all MAS tary osteodystrophy (AHO) frequently concurs with
patients. Interestingly, missense mutations were de- resistance to parathyroid hormone (PTH) and other
tected at only one position, i.e. Arg201His/Cys. The hormones like GH-releasing hormone, thyrotropin and
overall clinical picture of an individual MAS patient is gonadotropins acting via Gs-coupled receptors. AHO in
determined by the distribution of cells bearing the conjunction with this kind of hormone resistance gives
somatic gsp mutation. It is tempting to speculate that rise to the complex syndrome of ▶pseudohypopar-
germline gsp mutations are incompatible with life. In athyroidism (PHP) type Ia (Table 2). Patients with
contrast to the latter concept, one patient with severe pseudopseudohypoparathyroidism (PPHP) show the
726 G-Proteins and G-Protein Mutations in Human Diseases
typical features of AHO, yet do not suffer from any long and two short forms of Gαs. There is little
kind of hormone resistance. On the contrary, PHP Ib evidence to suggest that these splice variants have
patients present with symptoms of isolated PTH distinct signaling properties. During the past few years
resistance, but lack the typical AHO phenotype. A it has become obvious that GNAS1 not only codes for
mild form of thyrotropin resistance has recently been Gαs, but also for several other transcripts by using
observed in PHP Ib patients raising the possibility that 4 alternative promoters and first exons which splice
other endocrine systems may also be affected. onto a common exon 2 (Fig. 2). The most upstream
Subsequent systematic studies were able to allocate alternative promoter gives rise to transcripts coding for
the molecular defect in these three main forms of PHP the chromogranin-like neuroendocrine secretory pro-
to GNAS1. tein 55 (NESP55) whose entire coding sequence
Heterozygous inactivating mutations affecting one of resides in the upstream exon, thus leaving Gαs exons
the Gαs-specific exons are the molecular cause of PHP 2–13 within the 3′ untranslated region of the NESP55
Ia, PPHP and of progressive osseous heteroplasia transcript. The NESP55 promoter is methylated on the
(POH). POH patients suffer from severe heterotopic paternal allele, so that the NESP55 gene is exclusively
ossification involving skeletal muscle and deep con- transcribed maternally. The XLαs transcript encodes a
nective tissue. They frequently lack hormone resistance protein with an extended N-terminus when compared
and typical AHO features. Many of the GNAS1 to Gαs and is transcribed from the paternal allele only
mutations are deletions or insertions that give rise to (Fig. 2). The C-terminal 348 amino acid residues are
frameshifts and premature stop codons, nonsense identical to Gαs. XLαs is highly expressed in the
mutations or splice junction mutations. In addition, a pituitary, is targeted to the plasma membrane, interacts
number of missense mutations adversely affect protein with βγ subunits and can be activated by non-
stability. The latter scenario is exemplified by a hydrolysable GTP analogs. However, there is no
missense mutation, A366S, in the critical guanine evidence that XLαs is regulated by GPCRs. An
nucleotide binding motif of the GTPase domain, additional transcript derived from the sense strand of
leading to an accelerated release of GDP from the α the paternal allele uses exon 1A (exon A/B) as the first
subunit and marked instability of the guanine nucleo- exon and also splices onto exons 2–13 (Fig. 2).
tide-free protein at the body core temperature of 37 °C. However, exon 1A generates transcripts that are
At lower ambient temperatures, for instance in the testis, presumably untranslated. Upstream of the XLαs exon,
protein stability is not impaired and the accelerated a promoter for antisense transcripts traversing the
nucleotide exchange manifests as constitutive Gαs NESP55 exon has been identified. These NESP55
activity. Therefore, the clinical phenotype of PHP Ia antisense transcripts are only expressed from the
and excessive testicular testosterone production (testo- paternal allele and may contribute to the imprinting
toxicosis) arise from the intriguing A366S mutation of NESP55 by silencing the NESP55 promoter on the
(Table 2). In most tissues, an approximately 50% paternal allele.
reduction of functional Gαs activity significantly Around 100 autosomal genes are subject to ▶genomic
reduces cAMP formation in the case of inactivating imprinting. One genomic region controlled by this
Gαs mutations. However, the cAMP levels that can still epigenetic phenomenon is located in the distal portion
be generated are sufficient to maintain physiological of chromosome 2 and encompasses GNAS1. All
functions. Thus, there is no evidence for haploinsuffi- imprinted genes have one or more regions in which
ciency to explain the hormone resistance observed in the cytosines within CpG dinucleotide stretches are
patients. methylated on one parental allele only. Very often these
Retrospective analyses of PHP patients revealed that methylated regions coincide with gene promoters. As
the clinical phenotype was strongly influenced by the described above and illustrated in Fig. 2, the promoter
parent transmitting the mutated allele. Any inactivating regions of GNAS1 display a complex imprinting
Gαs mutation leads to AHO irrespective of the parent pattern. To complicate the scenario even further, the
transmitting the defective gene. Hormone resistance promoter region giving rise to Gαs transcripts does
characteristic of PHP Ia occurs only if the genetic not exhibit allele-selective methylation and in most
defect is inherited from a mother suffering from either tissues expression occurs bialellically. This situation
PHP Ia or PPHP. Conversely, there is mounting notwithstanding, paternal Gαs expression is silenced in
evidence that POH is inherited from the father. This a few tissues by a mechanism that is presently
conspicuous parent-of-origin-specific inheritance pat- unknown. In proximal renal tubular cells, adipocytes,
tern suggests that GNAS1 is imprinted. pituitary gland, thyroid and gonads, Gαs expression is
The human gene for Gαs is a single-copy gene located largely driven by the maternal allele. In PHP Ia
at 20q13.2-13.3. Gαs is encoded by exons 1–13 patients, renal proximal tubule cells are resistant
(Fig. 2). Alternative splicing of exon 3 produces two to PTH action, because Gαs expression is restricted
G-Proteins and G-Protein Mutations in Human Diseases 727
G-Proteins and G-Protein Mutations in Human Diseases. Figure 2 Organization and imprinting of the
GNAS1 locus.GNAS1 is characterized by 4 alternative first exons which splice onto exon 2. Methylation patterns
(methyl) and transcriptional activation (arrows) of the maternal and paternal allele are indicated. The hatched
arrow for exon 1 of the paternal allele indicates that it does not contribute to Gαs expression in all cells.
An antisense mRNA is transcribed across the NESP55 exon on the paternal allele.
to the maternal allele that carries an inactivating tissue-specific repressor protein that hampers Gαs
mutation. Yet PHP Ia patients are not prone to expression. Alternatively, the deletion may disrupt a
hypercalciuria, suggesting that the anticalciuric PTH cis-acting imprinting control element necessary for the
action in the thick ascending limb is fully operative methylation imprint at exon 1A (exon A/B). As the
because of biallelic Gαs expression in this part of the described deletions disrupt another gene, STX16
nephron. Thus, tissue- and cell-specific imprinting coding for Syntaxin 16, one may speculate that the
represents the molecular mechanism underlying the STX16 region comprises such an imprinting control
clinical features of PHP Ia, while haploinsufficiency element. The relevance of these epigenetic changes, i.e.
alone may lead to AHO. the loss of maternal-specific methylation of GNAS1,
A first glance at the mechanism of tissue-specific Gαs for the clinical PHP Ib phenotype is emphasized by a
imprinting was granted by studies on patients with PHP patient with paternal uniparental disomy of chromo-
Ib. The vast majority of these patients who present with some 20q. In this situation both long arms of
renal PTH resistance, sometimes accompanied by chromosome 20q are of paternal origin resulting in
partial TSH resistance, exhibit a loss of methylation PTH resistance, but not in AHO.
at the GNAS1 exon 1A, while lacking mutations in the A unique heterozygous 3 bp deletion causing loss of
exons coding for Gαs. This loss of the maternal allele- Ile382 in the C-terminus of Gαs was detected in 3
specific methylation pattern, linked to an upstream 3 kb affected boys with PHP Ib. When heterologously
deletion, makes the maternal allele look like the expressed, the mutant Gαs was found to be unable to
paternal one, resulting in silencing of maternal Gαs couple to the PTH receptor, while interaction with the
expression in renal proximal tubules. One possible Gs-coupled thyrotropin and luteinizing hormone re-
explanation is based on the hypothesis that the non- ceptors was unaffected. These results explain PTH-
methylated exon 1A region allows for the binding of a specific hormone resistance in the affected patients.
728 GPx
The absence of any phenotype in the mother and 3. Pierce KL, Premont RT, Lefkowitz RJ (2002) Seven-
maternal grandfather carrying the same mutation is transmembrane receptors. Nat Rev Mol Cell Biol 3:639–
commensurate with our current understanding of 650
4. Palczewski K, Kumasaka T, Hori T et al (2000) Crystal
paternal imprinting of the GNAS1 gene. structure of rhodopsin: A G-protein-coupled receptor.
Science 289:739–745
The Gb3-C825T Polymorphism in Multigenic Disorders 5. Wall MA, Posner BA, Sprang SR (1998) Structural basis
A single base substitution (C825T) in the Gβ3 subunit of activity and subunit recognition in G-protein hetero-
leading to a truncated protein has originally been trimers. Structure 6:1169–1183
reported in association with primary hypertension. 6. Offermanns S (2003) G-proteins as transducers in
transmembrane signalling. Progr Biophys Mol Biol
More recently, genetic associations with a number of
83:101–130
other disorders such as obesity and insulin resistance 7. Cabrera-Vera TM, Vanhauwe J, Thomas TO et al (2003)
have been suggested. So far, the underlying mechanism Insights into G-protein structure, function, and regula-
by which the Gβ3 variant causes the different tion. Endocr Rev 24:765–781
phenotypes remains elusive. 8. Spiegel AM, Weinstein LS (2004) Inherited diseases
involving G-proteins and G-protein-coupled receptors.
Annu Rev Med 55:27–39
9. Weinstein LS, Liu J, Sakamoto A et al (2004) GNAS:
Conclusions
normal and abnormal functions. Endocrinology
In the past few years significant progress has been 145:5459–5464
made towards a truly molecular understanding of 10. Bastepe M, Juppner H (2005) GNAS locus and
receptor/G-protein-mediated signal transduction. One pseudohypoparathyroidism. Horm Res 63:65–74
crystal structure of a heptahelical receptor, rhodopsin,
and several of G-proteins provide a solid foundation
for future work on the mechanisms of receptor and
G-protein activation. An important goal will be to
determine the structural differences between the
inactive and active receptor conformations as well as
GPx
the structure of receptors in complex with heterotri-
meric G-proteins. Studies on engineered gene-deficient
mice as well as the thorough in vivo and in vitro ▶Glutathione Peroxidase
characterization of naturally occurring G-protein muta-
tions detected in patients have taught us invaluable
lessons on the physiology of these cardinal signaling
proteins. Studying clinical and molecular aspects of the
different forms of PHP has highlighted the complex G-Quartet DNA
regulation of Gαs expression and provided remarkable
insights into the basic mechanisms of genomic
imprinting. Our understanding of receptor and G- Definition
protein-mediated signaling processes has shifted G-quartet DNA (also know as G4 DNA) defines a four-
from studying linear signaling cascades towards the stranded DNA structure formed by nucleic acid rich
consideration of complex signaling networks which guanine/cytosine regions. This structure is highly
will require novel collaborative research initiatives to stabilized by a planar array of four hydrogen-bonded
integrate bits and pieces of knowledge into a coherent guanine bases.
instructive model. ▶DNA Helicases
▶Cardiac Signaling: Cellular, Molecular and Clinical
Aspects
Definition
Granulomatosis refers to a multisystem disease that is
characterized by an inflammation of the blood vessels
(▶vasculitis) involving the upper and lower respiratory
Grade of Malignancy tracts and variable degrees of systemic, small vessel
vasculitis, which is generally considered to represent a
hypersensitivity reaction to an unknown antigen.
Definition ▶Recombinant Protein Expression in Bacteria
Grade of malignancy designates the histomorphologi-
cal assessment of the malignant behavior of a tumor, as
estimated by cytological criteria such as nuclear
pleomorphism and number of mitoses, and histological
G
criteria such as the formation of differentiated struc- GRAS
tures. Usually, three grades (G1, well differentiated;
G2, moderately differentiated; and G3, poorly differ-
entiated; with increasing aggressiveness in this order) Definition
are distinguished. Generally Recognised As Safe: the US Congress
▶Breast Cancer established this concept and regulatory policy in 1958
as part of its food safety legislation. Judged by qualified
experts, it means that ingredients or hosts are safe when
used in food or food production to accomplish their
technical or nutritional purposes.
Granulation Tissue ▶Recombinant Protein Expression in Yeast
Definition
Granulation tissue defines a new connective tissue that
is formed during the wound repair process and
Grb2
temporarily replaces the lost dermal part of the skin.
The name derives from the granular appearance of
numerous new capillaries. Definition
▶Wound Healing Grb2 stands for Growth-factor-receptor-bound protein
2. It is an adaptor protein containing src homology
domains, one of which binds to and translocates the
guanine nucleotide exchange factors ▶SOS. It is
involved in activation of Ras, but can also play a role
in other signaling pathways in mammalian cells.
Granuloma ▶Ras Signalling Pathway
▶Signal Transduction: Integrin-Mediated Pathways
▶Tyrosine Kinase
Definition
Granuloma represents a chronic inflammatory lesion
initiated by various infectious and non-infectious
agents. Granuloma consists of either small, nodular
aggregations of mononuclear inflammatory cells or of Green Fluorescent Protein
aggregations of different cells, usually modified
macrophages surrounded by lymphocytes and multi-
nucleated giant cells. Sometimes granuloma may also Definition
contain eosinophils and B cells, and are surrounded by GFP stands for Green Fluorescent Protein. It is a natural,
fibrotic tissue. 27 kDa fluorescent protein, originally produced by the
▶Morbus Wegener marine jellyfish Aequorea victoria, and fluoresces or
730 Greig’s Cephalopolysyndactyly
called ▶oncogenes can promote abnormal cell pro- and selectively stimulate proliferation of endothelial
liferation, a hallmark of cancer cells, by changing the cells to recruit new vessels into the anoxic tissue.
function of any protein involved in the normal control VEGF can also be released from cancer cells and
of cell replication. Thus oncogenes can lead to stimulate growth of vessels from the surrounding
abnormal expression of a particular growth factor, tissues, which ensures a blood supply in the growing
altered expression of growth factor receptors, increased tumor. PDGF is stored in blood platelets and can act on
growth factor receptor activity or any other disruption cells after platelet degranulation in association with
of the intracellular machinery that regulates cell tissue injury, bleeding and blood clotting. Together
division. These insights have led to the discovery of with other growth factors such as EGF, which is
individual growth factors, growth factor receptors and also stored in platelets, PDGF acts locally on cells in
intracellular signaling pathways that transmit growth the injured tissue and promotes cell proliferation in the
stimuli to the cell nucleus. These naturally expressed healing process (Fig. 1). Insulin-like growth factor-1
growth regulatory proteins are called proto-oncogenes (▶IGF I) chemically resembles the hormone insulin
and historically, they have been named after the virus and is mainly produced in the liver under the control of
that gave rise to the growth disturbing mutation. For growth hormone. During development, this growth
example, the proto-oncogene that encodes for platelet- factor participates in the regulation of skeletal growth G
derived growth factor (▶PDGF) was named sis after and maturation after birth, but it is also involved in
the virus simian sarcoma virus, which induces tissue repair and may be important for abnormal
proliferation in infected cells through over-expression proliferation of cancer cells. The transforming growth
of a PDGF-like protein. factor family (▶TGF) is a large family of different
The polypeptide growth factors include a wide variety growth factors that were initially characterized by their
of signaling molecules that can be categorized into ability to transform normal cells into tumor cells in
several groups or families. They can be produced and culture. They have profound effects on cell metabolism
secreted by cells in order to act locally on neighboring and cellular synthesis of extracellular matrix proteins,
cells (▶paracrine function) or actually even on the and in some cell types they rather prevent than
same cells that produced them (▶autocrine function). stimulate cell proliferation. ▶Cytokines are a unique
Some growth factors are also produced in organs but family of growth factors, which primarily act on cells in
exert their action on target cells after being transported the immune system and stimulate proliferation of
in the blood to a distance from the source (▶endocrine lymphoid cells. Cytokines such as the interleukins
function). (▶IL), a large family with more than 20 members,
One of the first growth factors characterized, EGF, is regulate proliferation on a variety of lymphocytes but
found in salivary glands in the gastrointestinal tract and also affect differentiation and growth of cells in the
promotes proliferation of a large variety of cells, bone marrow.
epithelial cells and mesenchymal cells included. The identification and characterization of growth
▶Erythropoetin is a growth factor produced in the factors have yielded potential therapeutic tools for the
kidney, which stimulates proliferation of immature red management of a large variety of human diseases. In
blood cells in the bone marrow. Some growth factors patients with chronic kidney failure, the metabolic
are stored extracellularly in tissues or in cells and can dysfunction of the organ will lead to anemia due to
be released to stimulate cells in the immediate vicinity. insufficient production of erythropoietin. Today, pa-
For example, fibroblast growth factor (▶FGF), a tients suffering from this condition receive injections of
member of a large family of growth factors, has the recombinant erythropoietin that stimulate production
capacity to be stored in tissues by binding to sugar of red blood cells in the bone marrow. A number of
residues on proteins in the extracellular matrix, the so- other conditions where the bone marrow does not
called proteoglycans, and can be released after injury to produce enough blood cells can also be corrected
participate in tissue repair by stimulating cell prolifera- through the addition of specific growth factors. For
tion. Several members of the FGF family stimulate example, deficient production of white blood cells
proliferation of endothelial cells and participate in the in the bone marrow, a side effect of cancer treatment
formation of new vessels, angiogenesis. FGF is also an with chemotherapy, can now be corrected through
important growth factor in the developing embryo and injections of growth factors that specifically stimulate
mutations in receptors for these growth factors have proliferation of leukocytes. In some disorders, espe-
been associated with several different bone disorders, cially in cancer, pharmacological approaches are
for example achondroplasia (dwarfism). There are taken to develop drugs that prevent the effects of
numerous other growth factors that are involved in growth factors, for example drugs that interfere
angiogenesis. Vascular endothelial cell growth factor with the binding between specific growth factors
(▶VEGF) can be produced in tissues with a deficient and their corresponding receptors on the surface of
blood supply, for example after myocardial infarction, cells.
732 Growth Factors
Growth Factors. Figure 1 Growth factors regulate healing responses. After tissue injury, bleeding and
formation of a blood clot, growth factors can be released from degranulating platelets (1) and stimulate fibroblast
proliferation or growth of new blood vessels in the injured tissue. In addition, inflammatory cells recruited from
the blood stream can produce growth factors (2); they may be released from storage pools in the tissue
(3) or they may be released from proliferating cells (4).
Growth Factors. Figure 2 Growth factors regulate gene transcription in cells. 1) Binding of a growth factor to a
corresponding receptor (receptor tyrosine kinase; RTK) on the cell surface induces dimerization of two receptor
subunits, which elicits phosphorylation (P) of tyrosine residues on the cytosolic domains of the receptor. 2) Specific
intracellular adaptor proteins (AP) bind to the receptor and facilitate activation of catalytic proteins, such as Ras,
which therafter activate intracellular signaling through the mitogen activated protein kinase (MAPK) pathway. 3) In
this pathway, phosphorylation of amino acid residues activates a series of kinases and the last activated protein in
the pathway enters the cell nucleus (4) where it induces activation of transcription factors (tf), which promote
expression of specific genes, for example genes coding for proteins necessary for cell proliferation.
Cytokines bind to tyrosine-kinase-associated receptors tyrosine residues on the receptor, which provides
that are structurally similar to RTKs but lack intrinsic binding sites for downstream signaling molecules and
tyrosine kinase activity in their cytosolic domains. initiation of signaling cascades. The receptors of the
Instead, these growth factor receptors are associated GPCR family are traditionally regarded as mediators of
with molecules that have tyrosine kinase activity. signaling for substances that regulate specific physio-
Receptor binding of cytokines also facilitates dimeriza- logical responses in differentiated cells, for example
tion of receptor subunits, which then mediate phosphor- contraction and relaxation in muscle cells. Lately, it has
ylation of associated tyrosine kinases. Activation of been understood that these receptors may also transmit
these tyrosine kinases leads to phosphorylation of growth signals in less differentiated and proliferative
734 Growth Hormone Deficiency
cells after stimulation by factors not normally perceived changes in the expression and activity of families of cell-
as growth factors, for example angiotensin II which cycle regulatory proteins termed ▶cyclins and cyclin
stimulates contraction of smooth muscle cells in the dependent kinases (▶cdk). The expression of some
vessel wall but also initiates proliferation in de- cyclins is largely dependent on growth factors and
differentiated smooth muscle cells in diseased vessels. activation of growth factor receptors. After growth factor
GPCRs have an extracellular domain that binds the stimulation, cyclins are synthesized, which form com-
ligand, seven transmembrane segments and an intracel- plexes with cdks. These complexes catalyze massive
lular domain that associates with a guanine nucleotide- phosphorylation of the retinoblastoma protein, ▶Rb. In
binding protein (hence the name G-protein). Ligand resting cells, Rb is bound to the transcription factor E2F
binding, for example of thrombin or angiotensin, to its but upon hyperphosphorylation, E2F is released and can
specific GPCR extracellular domain, leads to a con- activate transcription of a number of genes necessary for
formational change in the cytosolic domain of the S-phase initiation and further cell-cycle progression.
receptor. The altered receptor structure facilitates
binding of G-proteins to the intracellular domain, which
in turn activates an enzyme attached to the plasma References
membrane. This enzyme catalyzes a reaction leading to 1. Oxford Reference Online. Oxford University Press.
the release of a second messenger, which can then BIBSAM, 2004. ▶http://www.oxfordreference.com
activate intracellular signaling molecules that reach the 2. Bast et al (2000) Cancer Medicine, 5th edn. BC Decker
nucleus and affects gene transcription. Inc, Hamilton
3. Lodish et al (2003) Molecular cell biology, 5th edn. WH
Cascades of intracellular signaling molecules constitute Freeman and Co, New York
links between growth factor binding to a growth factor 4. Alberts et al (1994) Molecular Biology of the Cell, 3rd
receptor and expression of genes in the nucleus that edn. Garland Publishing, New York and London
control cell proliferation. The ▶mitogen activated 5. Heldin and Purton (1996) Signal transduction, 1st edn.
protein kinase (MAPK) pathways are the most well Chapman & Hall, London
studied and understood intracellular signaling path-
ways, which are activated after binding of growth
factors to their corresponding cell surface receptors.
For example, binding of PDGF to the PDGF receptor
leads directly to receptor dimerization and autopho-
sphorylation of tyrosine residues on the cytosolic
domains of the receptor. Tyrosine autophosphorylation Growth Hormone Deficiency
provides binding sites for an adaptor protein called
Grb2. Grb2 in turn binds to another adaptor protein SoS
(son of sevenless) that can activate the small GTP Definition
binding proto-oncogene protein Ras. Ras then phos- Growth hormone deficiency is caused by absence,
phorylates a Map-kinase-kinase-kinase, which in turn decreased production or dysfunction of the anterior
phosphorylates a MAP-kinase-kinase and then finally, pituitary hormone, which results in dwarfism or short
the last signaling molecule of the pathway, a MAP- stature and possibly some metabolic abnormalities
kinase (MAPK) is activated. After phosphorylation, (such as hypoglycemia).
MAPK translocates into the nucleus and activates a set ▶Hypothalamic and Pituitary Diseases Genetics
of ▶transcription factors, which promote expression of
growth-related genes (Fig. 2).
The ▶cell cycle is divided into four phases, G0, G1, S, G2
and M. Non-malignant eukaryotic cells are normally
resting in the G0 phase. In cells that harbor the capacity to
proliferate, such as fibroblasts, growth factors stimulate
the cells to leave the G0 phase and enter the G1 phase.
When cells have been stimulated with growth factors for a Growth Plate
specific time period in the G1 phase, further progression
in the cell-cycle and cell division will inevitably follow,
even if the growth factor is removed. This so called Definition
▶restriction point is followed by the S-phase where DNA The growth plate is a cartilaginous structure at the end
replication takes place, the G2 phase when factors of bones that generates the entire longitudinal growth
necessary for the physical division of the cell are through proliferation and differentiation of chondro-
produced and finally the M phase when mitosis occurs. cytes, and the conversion of cartilage into bone.
Passage through the cell-cycle is controlled by periodic ▶Bone Disease and Skeletal Disorders, Genetics
Guanine Nucleotide Exchange Factors 735
▶γ-Secretase Definition
GTPase-activating proteins (GAPs) comprise of pro-
teins that bind to a GTP-binding protein and inactivate
it by stimulating its GTPase activity so that it
hydrolyzes its bound GTP to GDP.
▶Rho, Rac, Cdc42
GSK3
Definition
Guanine nucleotide exchange factors (GEFs) are
GTP proteins that catalyze the release of guanine nucleotides
(mostly GDP) from monomeric or heterotrimeric
GTPases, thereby allowing them to bind GTP in its
Definition place. In the latter case, heptahelical receptors serve as
GTP stands for Guanosine 5′–triphosphate. It is GEFs.
produced by phosphorylation of GDP (guanosine 5′– ▶G-Proteins
diphosphate). ▶Rho, Rac, Cdc42
▶Rho, Rac, Cdc42 ▶Tight Junctions
736 Gut Epithelium
Paneth cells, which migrate downwards to the crypt APC leads to inappropriate activation of the Wnt/
bottom – migrate to the villus tip in the small β-catenin signaling pathway (3) and to chromoso-
intestine or to the surface of the colon cuffs, and mal instability linked to altered kinetochores. Rare
subsequently undergo apoptosis and exfoliation into cases of FAP without germ line mutation of APC are
the gut lumen. The overall life cycle of the gut associated to biallelic germ line mutations in the
epithelial cells is around 5 days in humans. base excision repair gene MYH. The sporadic form
5. Multipotent stem cells are maintained in the adult of colorectal cancer is mainly related to chromoso-
gut epithelium. mal instability and to gradual histological changes,
The continuous renewal of the gut epithelium the adenoma-carcinoma sequence, associated to the
implies the presence of stem cells located in a accumulation of somatic alterations in a number of
specific niche near the crypt bottom surrounded by tumor suppressor genes (APC, p53) and oncogenes
sub-epithelial myofibroblasts and specific extracel- (K-ras, Bcl2). Malignant tumors occur almost
lular molecules. Although intestinal stem cells have exclusively in the colon and rectum, while atypical
still not been isolated, several experimental ap- tumors develop in the small intestine in the context
proaches – transgenesis, mouse embryo aggregation of chronic inflammatory bowel disease. The pri-
chimeras, mutagenesis, regeneration after X-ray mary incidence of tumors in the colon correlates G
irradiation – have provided evidence that all the with the colon specific expression of the anti-
differentiated cell types of the gut epithelium derive apoptotic gene Bcl2 at the stem cell positions in the
from ▶multipotent stem cells (1, 2). Stem cells crypt base.
undergo asymmetrical division to stochastically
produce one new self-maintaining stem cell and
Regulatory Mechanisms
one daughter cell that cycles and fuels a population
of potential clonogenic stem cells. These cells are 1. Intestinal epithelial identity and homeostasis: in-
further displaced into the compartment of transit volvement of the ▶Cdx genes.
amplifying cells that eventually go into the ▶Homeobox genes belong to a large family of
differentiation process. Potential clonogenic stem transcription factors acting at multiple levels during
cells, unlike transit amplifying cells, can replace true embryonic development. Cdx1 and Cdx2 are two
stem cells if this population is altered. The process paralogue homeobox genes, which are expressed in
towards differentiation uses two minor pathways the presumptive gut endoderm and in posterior
consisting of long-lived progenitors of absorptive organs in the embryos, and specifically in the gut
and of mucous cells respectively and one major epithelium throughout adult life. Cdx2 displays the
pathway in which short-lived progenitors can homeotic function devoted to defining the intestinal
produce both absorptive and mucous short-lived identity (4). Indeed, ectopic expression of Cdx2 in
progenitors. Interestingly, glucagon-like peptide-2 the stomach epithelium – normally devoid of Cdx
produced by one of the enteroendocrine cell types – gene expression – converts gastric mucosa to an
known to prevent intestinal damage and to facilitate intestinal phenotype, whereas loss of expression in
repair – signals through enteric neurons to produce a the gut endoderm leads to a gastric transdifferentia-
mediator that stimulates the production of long- tion. Cdx1 and Cdx2 are also modulators of cell
lived progenitors of the absorptive lineage (2). renewal via distinct and complementary functions;
6. Malignant epithelial tumors develop in the colon Cdx1 stimulates cell proliferation, resistance to
and rectum. apoptosis and eventually cell differentiation,
Colorectal cancer is a major disease in terms of whereas Cdx2 reduces cell proliferation and stimu-
incidence and malignancy. It results from imbal- lates differentiation. The Cdx1 and/or Cdx2 proteins
anced cell proliferation, differentiation, migration act directly on a panel of target genes and cellular
and apoptosis in colonic crypts. The vast majority of functions including regulators of cell cycle and
tumors are of sporadic origin while a small apoptosis (p21WAF, Bcl2), transcription factors
proportion is familial. The major familial form, (KLF4), proteins involved in cell interactions (LI-
▶Hereditary Non-Polyposis Colon Cancer cadherin), in calcium metabolism (vitamin D
(HNPCC), is characterized by microsatellite DNA receptor, calbindin-D9k) and in glucose metabolism
instability resulting from germ line mutations in the (glucagon, glucose-6-phosphatase), digestive en-
MLH1 or MSH6 genes, which cause defects in zymes (sucrase, lactase) and mucus production
the DNA mismatch repair system. The second (Muc2). Interestingly, the expression profiles of
familial form, ▶Familial Adenomatous Polyposis Cdx1 and Cdx2 are altered in colorectal cancers
(FAP), is characterized by chromosomal instability and pro-oncogenic pathways have opposite effects
and is linked to germ line mutations in the tumor on the two homeobox genes. For instance Cdx1
suppressor gene ▶APC. Loss of function of is upregulated by the Wnt/β-catenin pathway,
738 Gut Epithelium
whereas Cdx2 is down-regulated by the ▶PI3K/Akt BMP (Bone Morphogenetic Protein) signaling and
pathway. These observations suggest that Cdx1 and Hh (Hedgehog) signaling also control the prolifera-
Cdx2 dysfunctions contribute to cancer progression. tion/differentiation equilibrium of the gut epithelium
In accordance with this, the alteration of the Cdx2 (7, 8, 9).
status sensitizes the colon epithelium to carcinogen- 3. Molecular determinants of gut stem cells and
esis, linked to a lower capacity to switch on progenitors.Although gut stem cells and progenitors
apoptosis. Thus, in addition to its homeotic function have been described near the crypt base, molecular
during embryonic development of the digestive markers remain elusive. Yet, a gene product, Msi1,
tract, the Cdx2 homeobox gene is a gut-specific has been reported in intestinal crypts, specifically in
tumor suppressor gene in the adult colon. individual cells that display the theoretical location
2. The Wnt/β-catenin signaling pathway. attributed to stem cells (1). Msi1 is an RNA-binding
Cell growth and polarity are major attributes of the protein associated with asymmetrical division of
gut epithelium, and the Wnt/β-catenin pathway neural progenitors. Msi1 controls the translation of
plays a pivotal role in this balance. β-catenin several RNAs; in particular, it acts as a translational
contributes to cell polarity by cross-linking the repressor of numb, an antagonist of Notch signal
membrane E-cadherin to the actin cytoskeleton. In activation. The Notch pathway is involved in the
differentiated epithelial cells, an excess of β-catenin maintenance of an undifferentiated state by a lateral
molecules not coupled to E-cadherin are loaded on a inhibition mechanism by which a cell differentiating
complex comprising APC, Axin and the CKI and along a given pathway produces a signal that
GSK3β kinases that phosphorylate β-catenin and prevents neighboring cells from differentiation
target it to the proteasome degradation system. along the same pathway. Upon activation by their
During intestinal development as well as in the ligands, the intracellular domains of Notch receptors
crypts, the Wnt/β-catenin signaling pathway is are released and translocated into the nucleus where
activated by secreted morphogens of the Wnt family they bind to CSL (CBF1/Suppressor of hairless/
that bind to Frizzled receptors and activate several Lag1) DNA binding proteins. This results in the
downstream pathways, one of which leads to the transcriptional activation of downstream targets
inhibition of GSK3β activity. In this context, β- encoding ▶bHLH transcription factors of the Hes
catenin escapes degradation and translocates into family, among which is Hes1. These are negative
the nucleus to bind HMG-box transcription factors regulators of differentiation, by repressing other
of the Tcf/Lef family. These factors play a major role bHLH genes that promote differentiation. Indeed, a
in the maintenance and self-renewal of the stem cell series of gene invalidations led to the conclusion
stock, since Tcf4-deficient mice lack proliferative that cell fate commitment in the intestine depends
cells in the prospective intervillous regions; in essentially on a genetic cascade controlled by bHLH
contrast over-expression of Lef1 causes increased transcription factors. Firstly, Hes1, which is ex-
stem cell apoptosis. Major targets of the activated pressed in crypt cells like Math1 and Ngn3,
Wnt/β-catenin pathway are the proto-oncogene c- maintains the precursor pool expansion and pre-
myc and cyclin D1, which subsequently down- vents premature endocrine and mucous cell differ-
regulates the cell cycle inhibitor p21WAF (5). entiation. Secondly, Hes1 antagonizes Math1,
Interestingly, a transcriptomic approach has identi- which is required for the commitment of the three
fied a set of genes of the c-myc cascade in the stem secretory cell lineages of the gut epithelium (goblet,
cells/progenitors compartment, including regulators endocrine and Paneth cells). This suggests that the
of c-myc gene transcription and protein stability and choice between absorptive and secretory cell
c-myc downstream targets (6). As mentioned above, lineages is balanced by Hes1 and Math1 (2).
a direct Wnt/β-catenin tissue-specific target is Cdx1, Thirdly, within the population of Math1 expres-
and an indirect target is Cdx2, through another sing-cells, Ngn3 specifies the endocrine progenitors
HMG-box factor, SOX9. Finally, the Wnt/β-catenin (10), whereas NeuroD, which is expressed in the
pathway also controls crypt cell sorting via the villus cells, is required for the differentiation of a
regulation of combinations of ephrin receptors subset of endocrine cells. Genetic modulations of
EphB2/EphB3 and their ligands EphrinB1/ the Notch pathway have confirmed its influence on
EphrinB2 (5).Thus, crypt formation, cell sorting the maintenance of undifferentiated, proliferative
and the mechanism of stem cell selection appear to cells in the crypts and on intestinal cell lineage
depend on an adequate threshold of β-catenin- specification by acting on the balance between
mediated signaling during normal intestinal home- bHLH factors (11).
ostasis, whereas inappropriate activation of this
In conclusion, the renewal of the gut epithelium has
pathway contributes to colorectal tumorigenesis.
long been recognized as a paradigm in cell biology as
Cooperating with the WNT/β-catenin pathway,
Gut Epithelium 739
regards well-ordered cell proliferation followed by 2. Brittan M, Wright NA (2002) Gastrointestinal stem cells.
differentiation from self-renewing stem cells. Recent J Pathol 197:492–509
investigations have provided new insights into the 3. Giles RH, van Es JH, Clevers H (2003) Caught up in a
Wnt storm: Wnt signaling in cancer. Biochim Biophys
molecular mechanisms implicated in multiple aspects Acta 1653:1–24
of this process, such as the determination of intestinal 4. Freund J-N, Domon-Dell C, Kedinger M et al (1998) The
identity, the cell commitment into differentiated Cdx1 and Cdx2 homeobox genes in the intestine.
lineages. Emerging concepts propose integrative mod- Biochem Cell Biol 76:957–969
els involving the interplay of local and reciprocal 5. Van de Wetering M, Sancho E, Verweij C et al (2002)
stimulatory and inhibitory signals between epithelial The beta-Catenin/TCF-4 complex imposes a crypt
progenitor phenotype on colorectal cancer cells. Cell
cells and the underlying myofibroblasts, to fine-tune
111:241–250
the homeostatic balance between stemness, commit- 6. Stappenbeck TS, Mills JC, Gordon JI (2003) Molecular
ment, proliferation and differentiation (7, 9). Further- features of adult mouse small intestinal epithelial
more, these results enlarge our knowledge of the progenitors. PNAS100:1004–1009
molecular alterations at the basis of malignant 7. He XC, Zhang J, Tong WG et al (2004) BMP signaling
transformation in ▶colorectal cancers. However, the inhibits intestinal stem cell self renewal through
suppression of Wnt-beta-catenin signaling. Nat Genet
exact nature of the gut stem cells and their relationship
36:1117–1121
G
and regulation by neighboring elements of the stem cell 8. Madison BB, Braunstein K, Kuizon E et al (2005)
niche remain to be elucidated. Understanding the Epithelial hedgehog signals pattern the intestinal crypt-
biology of the gut stem/progenitor cells is a challenge villus axis. Development 132:279–289
for the future that should open new avenues in the field 9. Radtke F, Clevers H (2005) Self-renewal and cancer of
of cancer therapy, intestine regeneration and cellular the gut: two sides of a coin. Science 307:1904–1909
therapy of type-1 diabetes. 10. Jenny M, Uhl C, Roche C, Duluc I et al (2002)
Neurogenin3 is differentially required for endocrine cell
fate specification in the intestinal and gastric epithelium.
EMBO J 21:6338–6347
References 11. Van Es JH, van Gijn ME, Riccio O et al (2005) Notch/
1. Booth C, Potten CS (2000) Gut instincts: thoughts on gamma-secretase inhibition turns proliferative cells in
intestinal epithelial stem cells. J Clin Invest 105:1493– intestinal crypts and adenomas into goblet cells. Nature
1499 435:959–963