Genoma 01

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Genome Organisation: Advanced article

Human Article Contents


• Introduction
• Sequence Complexity
David H Kass, Eastern Michigan University, Ypsilanti, Michigan, USA
• Genome Organisation at the Chromosomal Level
Mark A Batzer, Louisiana State University, Baton Rouge, Louisiana, USA • Genome Organisation of DNA Sequences
Representing the Differing Levels of Complexity
Based in part on the previous version of this eLS article ‘Genome • Mitochondrial Genome
Organization: Human’ (2001, 2004). • Genome Dynamics and Evolution
• Acknowledgements

Published online: 20th April 2021

The human nuclear genome is organised into can be further classified as either gene superfamilies (func-
a highly complex arrangement of two sets of tionally related with weak sequence homology) or gene fami-
23 chromosomes comprised of various types of lies (related paralogous genes) some of which are functional.
deoxyribonucleic acid (DNA) sequences. With less These are found to be dispersed and/or tandemly repeated and
may encode functional proteins, noncoding functional ribonu-
than 2% of the human genome consisting of
cleic acid (RNA) genes or nonfunctional pseudogenes derived
protein-encoding DNA sequence, the remainder of
from these sequences. Repetitive sequences with no known func-
the 3 × 109 bp of the haploid genome consists of a tion include the various highly repeated satellite families, and
multifaceted assortment of DNA sequence classifi- the dispersed, moderately repeated transposable element (TE)
cations. At the broadest level, the genome can be families, though there has been some evidence of TE function
divided into single-copy protein-encoding genes, including involvement in immune response (Kassiotis and Stoye,
repetitive sequences and spacer DNA. Notably, the 2016) and gene regulation (Platt II et al., 2018). The remain-

!
categories of repetitive sequences are profoundly der of the genome consists of spacer DNA, representing a broad
intricate and may be further classified into func- category of undefined intergenic DNA sequences. Overall, the
tional or functionally related groups such as gene human genome consists of 23 pairs of chromosomes, or 46 DNA
molecules, of differing sizes (Table 1). This review is designed to
families and superfamilies, or groups with no
provide a synopsis of the organisation, complexity and dynamics
known function such as satellite DNA and trans-
of the human genome. A general overview of the genomic cate-
posable elements. Their genomic organisation gories, examples and organisation is summarised in Table 2.
may be dispersed, within localised regions, and/or
tandemly repeated. The generation of repetitive
sequences results from a variety of mechanisms Sequence Complexity
continually promoting a genome that is highly
dynamic. The human genome contains various levels of complexity as
demonstrated by reassociation kinetics. This involves the random
shearing of DNA into small fragments averaging about 500 bp,
heat denaturation to separate the strands of the double helix and
Introduction slow cooling. During cooling, complementary sequences anneal;
the more copies there are of a particular sequence, the greater
The human nuclear genome contains over 3000 million base pairs the chance of finding a complement to anneal to. Therefore, the
(Mbp) of deoxyribonucleic acid (DNA) with profound organi- reassociation is dependent on time (t), as well as the initial con-
sational complexity that can be classified to some degree based centration of that sequence (C0 ) yielding what is referred to as a
on certain sets of characteristics. Most broadly, these include C0 t value. Analysis of the human genome utilising this method-
the single-copy protein-encoding genes (representing roughly ology has provided estimates of 60% of the DNA is present as
2–4% of the total genomic sequences), repetitive sequences and either single copy or in very low copies; 30% of the DNA is
intergenic (spacer) DNA (Figure 1). The repetitive sequences moderately repetitive; and 10% is considered highly repetitive.

eLS subject area: Genetics & Disease Genome Organisation at the


How to cite: Chromosomal Level
Kass, David H and Batzer, Mark A. Genome Organisation:
Human, eLS, Vol 2: 61–71, 2021. Various chromosomal staining techniques present alterna-
DOI: 10.1002/9780470015902.a0029269 tive banding patterns of mitotic chromosomes referred to as

Volume 2, Issue 1, April 2021 eLS © 2021, John Wiley & Sons, Ltd. www.els.net 61
Genome Organisation: Human

Human genome

Single-copy DNA present in more than Spacer DNA


protein-encoding genes 1 copy (repetitive)

Coding sequence Introns and cis-regulation


sequences

Functional sequences:
interspersed and/or tandem Sequences with no known function

Gene superfamilies Highly


(structurally related, but functionally Gene families Interspersed
(paralogous genes) repeated
and evolutionarily distinct) sequences in tandem
(satellites)

Highly DNA- Retrocopies


Coding Identical
Coding Non-coding homologous mediated
genes genes RNA genes transposons
with Macro- Micro-
with (regulatory
conserved Copies Evolutionarily satellites satellites
conserved and
protein amino structurally of the and Retrotransposons
Mini-
domains acid motifs similar) same functionally
satellites
gene related

!
genes

Figure 1 Broad classification of DNA sequences in the human genome.

Table 1 DNA content of human chromosomes in megabases DNA. The C-banding technique yields dark-staining regions of
(EMSEMBL GRCh37) the chromosome (or C bands), referred to as heterochromatin.
These regions are highly coiled, contain highly repetitive DNA
Chromosome Amount of Chromosome Amount of
DNA (Mb) DNA (Mb) and are typically found at the centromeres, telomeres and on the
Y chromosome. They are composed of long arrays of tandem
1 249 13 108 repeats and therefore some may contain a nucleotide composi-
2 237 14 105 tion that differs significantly from the remainder of the genome
3 192 15 99 (approximately 40–42% GC). Therefore, these repeats can be
4 183 16 84 separated from the bulk of the genome by buoyant density (cae-
5 174 17 81 sium chloride) gradient centrifugation, with the identification of
6 165 18 75
a major band along with three minor bands referred to as satellite
7 153 19 69
bands, hence the term satellite DNA.
8 135 20 63
The G-banding technique yields a pattern of alternating light
9 132 21 54
and dark bands reflecting variations in base composition, time of
10 132 22 57
replication, chromatin conformation and the density of genes and
11 132 X 141
repetitive sequences. Therefore, the karyograms define chromo-
12 123 Y 60
somal organisation of DNA sequences and allow for the identi-
Data from EMSEMBL GRCh37, DNA content of human chromosomes fication of the different chromosomes. The darker bands, or G
in megabases. bands, are comparatively more condensed, more AT-rich, con-
tain relatively few genes, and replicate later than the DNA within
the pale bands, which correspond to the R bands by an alter-
karyograms. Although the three broad classes of DNA are native staining technique. More recently these alternative band-
scattered throughout the chromosome, chromosomal banding ing patterns have been correlated to the level of compaction of
patterns reflect notable levels of compartmentalisation of the scaffold-attachment regions (Craig et al., 1997).

Volume 2, Issue 1, April 2021 eLS © 2021, John Wiley & Sons, Ltd. www.els.net 62
!
Table 2 Components and organisation of nuclear DNA sequences in the human genome
Genome organisation Category Subcategory Genome characteristic &/ or example(s) Location
level
I. Chromosomal C bands Highly heterochromatic alphoid satellite DNA Centromere
G bands Relatively heterochromatic, A-T rich, gene poor Large dispersed chromosomal regions
R Bands Relatively euchromatic, GC rich, gene rich Large dispersed chromosomal regions
corresponding to unstained G bands
Satellites Nucleolar organiser regions, rRNA gene families Near ends of short arm of acrocentric

Volume 2, Issue 1, April 2021


chromosomes 13, 14, 15, 21, 22
II. Sequence Single-copy Roughly 25 000 genetic loci comprising 2–4% of Individual sites dispersed throughout genome,
protein-coding gene the genome primarily in euchromatic regions
Repetitive DNA
Satellite Alphoid (Macro)satellite 170 bp tandem repeats, comprise up of 10 Mbp Centromere
Minisatellite 15–100 bp tandem repeats, comprise1-5 kbp Interspersed and telomeres
Microsatellite Tandem repeats 4 bp or less Interspersed
Gene family Noncoding RNA genes rRNA, tRNA (different anticodon families), Tandem clusters in several chromosomal locations
snRNA (U-rich)
5SrRNA Clustered, single location
miRNA Widely distributed, primarily intergenic and
intronic regions
Protein coding Histone genes Tandem clusters at few chromosomal sites
Actin Dispersed
Globin genes 𝛼-globin clustered on chromosome 16, 𝛽-globin
clustered on chromosome 11, myoglobin on
chromosome 22
Gene superfamily Protein coding Protooncogenes, for example, myc and ras Dispersed on different chromosomes
Homeodomain-containing proteins Dispersed among a subset of single-copy
protein-encoding genes
Immunoglobulin protein domains Dispersed

eLS © 2021, John Wiley & Sons, Ltd. www.els.net


DEAD box, F-box, or WD box motifs Dispersed in a specific subset of protein-encoding
genes
Noncoding RNA genes lncRNA Dispersed
Transposable Class I. retrotransposons SINEs (Alu) Interspersed; AT-rich preference
elements (TEs)
LINEs (L1) Interspersed; GC-rich
HERVs or isolated LTR of HERVs Interspersed primarily intergenic regions, plus
high recombination regions such as the MHC
Class II. transposons Charlie, Mariner, Tigger Interspersed
Genome Organisation: Human

Nonfunctional genes DNA pseudogenes Adjacent to functional gene

63
𝛽-globin pseudogenes
Processed pseudogenes Hmg, actin, tRNA Dispersed
Genome Organisation: Human

The human genome may also be compartmentalised into large repetitive sequences may be in a tandem orientation and/or
segments of DNA with distinctive GC richness referred to as ‘GC dispersed throughout the genome. Repetitive sequences may be
content domains’ (Lander et al., 2001). There is a distinct corre- classified by function, dispersal patterns and sequence identity.
lation between GC-richness and gene density. This is consistent Satellite DNA typically refers to highly repetitive sequences
with the association of CpG islands, which are the 500–1000-bp with no known function; gene families are DNA sequences, with
GC-rich segments flanking (usually at the 5′ end), with most at least one functional gene, with members related by sequence
housekeeping and many tissue-specific genes. The clustering of homology and/or function; and interspersed repeat sequences
CpG islands, as demonstrated by FISH (Craig and Bickmore, are typically the products of TE integrations but may include
1994), further depicts gene-poor and gene-rich chromosomal seg- retropseudogene copies (retrocopies) of a functional gene.
ments.
Another organisational feature of the human genome at the
Repetitive sequences I. Satellites, minisatellites
chromosomal level is a thin bridge with rounded ends at the termi-
and microsatellites
nus of five acrocentric human chromosomes (13, 14, 15, 21, 22),
which are referred to as chromosomal satellites. These contain Satellites (or macrosatellites) are very long arrays, up to hun-
repeats of genes coding for ribosomal ribonucleic acid (rRNA) dreds of kilobases, of tandemly repeated DNA. The three satel-
and ribosomal proteins that coalesce to form the nucleolus and lite bands observed by buoyant density centrifugation represent
are known as nucleolar organising regions (NORs). sections of the human genome containing highly repeated DNA
that differ in GC richness in relation to the overall genomic com-
position. However, not all satellite sequences are resolved by
Genome Organisation of DNA density gradient centrifugation. Alpha satellite DNA or alphoid
Sequences Representing DNA (in humans) constitutes the bulk of centromeric heterochro-
matin on all chromosomes. The alphoid DNA consists of 170 bp
the Differing Levels of Complexity tandemly repeated sequences that can vary in number, constitut-
ing 0.5–10 Mb of DNA. The interchromosomal sequence diver-
Single-copy sequences gence of the alpha satellite families allows the different chromo-
somes to be distinguished by fluorescence in situ hybridisation
Although originally defined as a functional unit of heredity, a (FISH).
gene may be defined as a protein-encoding segment of DNA with Minisatellites are tandemly repeated sequences of 15–100 bp
!
various noncoding regulatory elements or a segment of DNA of DNA, yielding a total length ranging from less than 1 kbp
encoding a functional RNA molecule. Venter et al. (2001) esti- up to 15 kbp. One subset of minisatellites comprises the highly
mated that there are at least 26 588 protein-encoding transcripts, polymorphic arrays of variable number tandem repeats (VNTRs).
consistent with the 30 000–40 000 estimate of the International
These have no known function but can serve as useful DNA mark-
Human Genome Sequencing Consortium (Lander et al., 2001), as
ers (Jeffreys et al., 1985) for generating DNA profiles as tools
well as a more recent estimate of roughly 21 000 protein-coding
for forensics, paternity analysis, and so on. Several minisatel-
loci based on the GENCODE Project (Harrow et al., 2012).
lites, scattered throughout the genome share a core sequence with
With alternative splicing for these genes, it is estimated that
enough similarity to be analysed using a single probe in Southern
100 000 different proteins can be generated. Based on the 21 000
blotting, yielding highly informative DNA profiles. An example
estimate, the proportion of the genome consisting of genes is
is a 10–15-bp almost invariant core sequence (GGGCAGGANG)
roughly 10–11% assuming an average gene size of 15 kb. How-
of myoglobin minisatellites, identified among several polymor-
ever, approximately 90% of the DNA from protein-encoding
phic VNTR loci.
genes is noncoding, including the upstream and downstream
Telomeric DNA contains 10–15 kb stretches of sequences rep-
regulatory sequences, as well as the introns. Therefore, only
resenting a unique subset of minisatellites, which in the human
1.5–2% (45–60 Mb) of DNA has coding function. Cis-acting
genome are TTAGGG hexanucleotide tandem repeats located at
regulatory sequences include common promoters such as the
the termini of the chromosomes. These sequences are added by
TATA, CCAAT and GC boxes which are recognised by tran-
telomerase to ensure complete replication of the chromosome.
scription factors, and are located at specific upstream distances
Telomeres of somatic cells are generally shorter than in germ
from the transcription start site. There are also tissue-specific
cells, illustrated by their decreasing size within human B cells
cis-acting DNA sequences, such as hormone response elements
and skin cells with increasing age. In humans, it has been postu-
(HREs) in various orientations within or near a gene that serve as
lated that telomeric sequence loss is associated with ageing and
either enhancers or silencers to up or downregulate gene expres-
assorted genetic disorders (Turner et al., 2019).
sion, respectively. Many coding sequences may be included as
Microsatellites are small arrays of short simple tandem repeats,
members of gene families as described below. In addition, there
may be single-copy sequences in the spacer DNA with no known primarily 4 bp or less. Different arrays are found dispersed
(determinable) function. throughout the genome, although dinucleotide CA/TG repeats
are most common, yielding 0.5% of the genome. Runs of A’s
Repetitive sequences and T’s are common as well. Microsatellites typically have no
known functions. However, CA/TG dinucleotide pairs can form
The human genome contains various sized stretches of repeated the Z-DNA conformation in vitro, which is possibly indicative
DNA sequences with differing numbers of copies. These of function. Repeat unit copy number variation of microsatellites

Volume 2, Issue 1, April 2021 eLS © 2021, John Wiley & Sons, Ltd. www.els.net 64
Genome Organisation: Human

apparently occurs by replication slippage yielding highly poly- below) have been derived from tRNA-like genes (Deininger and
morphic DNA markers referred to as short tandem repeat poly- Batzer, 1993).
morphisms (STRPs). STRPs are commonly used in commercial Small nuclear ribonucleic acid (snRNA) molecules are consid-
DNA fingerprinting kits for ancestry studies. In addition, STRPs ered to function in RNA processing. There are six families of
are used as a forensic tool in crime investigations by generat- related snRNA genes, termed U1 to U6, that are dispersed among
ing profiles used to screen the combined deoxyribonucleic acid the chromosomes. However, differing cluster patterns have been
index system (CODIS) to identify an individual. In addition, the observed for these genes. For example 35–100 functional U1
expansion of trinucleotide repeats within genes has been associ- genes, all sharing 20 kb of nearly identical 5′ and 3′ flank-
ated with genetic disorders such as Huntington disease, myotonic ing sequences, are loosely clustered on chromosome 1p36 and
muscular dystrophy, spinocerebellar ataxias, Friedreich ataxia, contain over 44 kb of intergenic sequences, whereas 10–20 U2
fragile-X syndrome and fragile XE mental retardation (Paulson, genes are clustered in a tight, virtually perfect 6-kb repeat unit
2018). on 17q21–q22 (Lindgren et al., 1985). In addition, more than
one subfamily of a U snRNA has been identified; U3 com-
Repetitive sequences II. Gene families prises at least two subfamilies, which differ in their flanking
sequences. Also, pseudogenes of snRNA have been identified
Gene families generally consist of a set of genes with high and are thought to be dispersed in the genome by retrotransposi-
sequence identity over their entire length, primarily in the exons tion. tRNA genes are also found clustered with U RNA genes; for
of protein-encoding gene families. Members of gene families, example, chromosomal band 1p36 contains 15–30 copies each of
or possibly separate clusters of the same gene family, are con- U1, Glu-tRNA and Asn-tRNA genes (van der Drift et al., 1994).
sidered paralogous and are derived from an ancestral gene or snRNA additionally includes the noncoding gene family of
locus by duplication, therefore, are evolutionarily and function- micro (mi)RNA. MicroRNAs are individually expressed genes
ally related. The duplication of a gene, however, may alternatively that posttranscriptionally regulate the expression of genes. If
yield a nonfunctional pseudogene. In addition, there are groups the miRNA sequence, in which the final product is about 22
of genes with weak overall sequence identity that are homolo- nucleotides long, closely resembles the antisense sequence
gous at conserved domains or short amino acid motifs, that may (reverse complement) of an mRNA transcript (sense sequence),
have diverged to have no overlapping functions and collectively then it may bind and prevent translation of the mRNA. As a
termed gene superfamilies. Only a limited number of examples result of an incomplete match to the mRNA, a single miRNA can
will be discussed. regulate several genes. Hsu et al. (2006) estimated 762 human
! miRNA genes with a distribution of 52% in intergenic regions,
II A. Noncoding gene families with essentially identical or 40% in introns and 8% in exons, comprising approximately
highly homologous products. If a cell requires abundant 1% of all predicted genes in mammals and other eukaryotes.
amounts of particular proteins or RNA molecules, one logical In addition, many human miRNA genes appear to have been
solution might be the production of multiple functional copies derived from TEs (Piriyapongsa et al., 2007), particularly
of the same gene. The human genome and eukaryotic genomes, medium reiterated frequency repeats (MERs), mammalian-wide
in general, have amplified a number of genes whose products interspersed repeats (MIRs) and L1 elements, thereby forming
are responsible for general-purpose functions such as DNA distinct miRNA gene families. Over 3700 miRNA binding
replication (e.g. histones) and protein synthesis (e.g. transfer sites have been identified in long noncoding ribonucleic acid
ribonucleic acid (tRNA) and rRNA). (lncRNA) genes.
Genes that encode rRNA, inclusive of the spacer units, total
about 0.4% of the DNA in the human genome. The individ- II B. Coding gene families with high sequence identity.
ual genes of a particular rRNA family are essentially identical. There are numerous families of homologous genes in the human
The 28S, 5.8S and 18S rRNA genes are clustered, with both genome sharing extensive intra-family sequence identity. These
external transcribed spacer (ETS) units and internal transcribed are generally dispersed but occasionally contain linked mem-
spacer (ITS) units, in tandem arrays of approximately 60 copies bers. Histone genes are highly conserved among eukaryotes and
each yielding about 2 Mbp of DNA. These clusters are present have a fundamental role in chromatin structure. The histone
on the short arms of five acrocentric chromosomes consequently family consists of five genes that tend to be linked, although
totalling approximately 300 copies, and form the nucleolar organ- in differing arrays of variable copy numbers dispersed in the
iser regions (NORs). These three rRNA genes are transcribed as human genome. The individual genes of a particular histone
a single unit (yielding 41S rRNA) and then cleaved. 5S rRNA family encode essentially identical products (i.e. H4 genes yield
genes are clustered on chromosome 1q. the same H4 protein). Clusters of two or more tandemly repeated
There are an estimated 51 human tRNA gene families estab- histone genes, including all five (H3-H4-H1-H2A-H2B) have
lished by their anticodon sequence (Iben and Maraia, 2014). been identified on human chromosomes 1, 6 and 12 (Tripputi
In contrast to the distribution of rRNA genes, the more than et al., 1986). In addition, histone genes lack introns; a rare feature
500 tRNA genes demonstrate variable clustering patterns with for eukaryotic genes.
differing copy number variations (CNVs). Dispersal of tRNA Another gene family with a generalised important role in all
pseudogenes may have occurred via RNA-mediated retroposition cells is represented by the actin genes, which comprise the cellu-
(McBride et al., 1989) and is consistent with the postulation that lar cytoskeleton and are involved in a variety of functions includ-
various families of short interspersed DNA elements (discussed ing intracellular movement, stability and contraction. There are

Volume 2, Issue 1, April 2021 eLS © 2021, John Wiley & Sons, Ltd. www.els.net 65
Genome Organisation: Human

six different actin genes dispersed in the human genome, encod- typically 60 amino acids long that serve as transcription fac-
ing six different isoforms with greater than 90% amino acid tors in regulating gene expression. In a comprehensive study by
sequence conservation (Parker et al., 2020). The actin proteins Holland et al. (2007), 300 homeobox loci were identified in the
are differentially expressed in various cell types and interact with human genome of which 235 are likely to be functional. These
several different proteins. In addition, there are over 16 identified authors classified this superfamily as having 11 homeobox gene
actin pseudogenes thereby forming a multipseudogene family classes further divided into 102 homeobox gene families based
(Ng et al., 1985). on molecular phylogenetic analysis. The complexity of this gene
One of the most comprehensively studied gene families is the family with individual genes having diverse roles is consistent
haemoglobin family. Human haemoglobin is a tetrameric pro- with the complexity of human development. Additional large
tein consisting of two 𝛼-globin and two 𝛽-globin subunits. There gene superfamilies are listed in Gene Families: Multigene Fam-
are several possible polypeptides constructing the haemoglobin ilies and Superfamilies and include the RAS family, the SRC
molecule with differing physiological properties and ontological homology 3 domain and the immunoglobulin domain associ-
regulation. This probably occurred as a result of gene dupli- ated with the immune response. Genes of the immunoglobulin
cation allowing for divergence of sequences for procuring new superfamily encode proteins that form dimers consisting of extra-
function. The two globin families exist as clusters of genes cellular variable domains at the N-terminus and constant domains
and pseudogenes on separate chromosomes. The 𝛼-globin gene at the C-terminus. Members of the immunoglobulin superfamily
cluster is on human chromosome 16 and the 𝛽-globin cluster include immunoglobulin, human leucocyte antigen (HLA), T-cell
is on 11. Although related in sequence, there is greater intra- receptor (TCR), T4 and T8 genes.
than inter-cluster homology. Therefore, intra-cluster duplications A gene superfamily may lack sequence identity, therefore, dis-
postdate the duplication of the ancestral gene yielding 𝛼- and playing an overall low amount of similarity for the length of the
𝛽-globin. The ontological regulation is apparently coordinated gene, but contain highly conserved short amino acid motifs. For
on each cluster by upstream sequences, providing the expres- example, the DEAD box (Asp-Glu-Ala-Asp) is a motif within
sion of the gene best suited for the oxygen need (a foetus, for a gene family that encode proteins with helicase activity (Lin-
example, exists in a relatively hypoxic environment). Predating der et al., 1989). Another supergene family encode proteins
haemoglobin divergence is the evolution of haemoglobin and containing an F-box, often found in the amino-terminal half,
myoglobin from an ancestral gene. Myoglobin is a monomeric which is a 50 amino acid motif that functions in protein-protein
protein encoded by a single gene on human chromosome 22 and interactions (Kipreos and Pagano, 2000), coupling with motifs
such as leucine-rich repeats (LRRs) and WD repeats in the
!
stores oxygen in muscle, whereas haemoglobin is the oxygen car-
rier in blood. carboxyl-terminal region. The WD box genes are characterised
Protooncogones, the genes associated with signal transduction by between four and eight tandem repeats of a core sequence of
pathways, additionally are comprised of gene families. Muta- fixed length terminating in a WD dipeptide.
tions in these genes yield a dominant gain-of-function activity,
III B. Noncoding gene superfamily. Long noncoding ribonu-
becoming oncogenes and disrupting regulation of the cell cycle
cleic acid (lcnRNA) genes are, at this point, more complex to
thus leading to the progression of tumours. One such family is
characterise. They have been defined as a class of genes divided
represented by the MYC genes, in which the protein products
into four different functional archetypes based on modes of action
contain the basic helix-loop-helix domain of transcription factors.
(Wang and Chang, 2011). For this review, we have currently clas-
Functional members, including C-MYC, L-MYC and N-MYC, are
sified lcnRNA as a noncoding gene superfamily based on shared
not linked genetically, with the latter two demonstrating more
characteristics including most notably the RNA transcripts being
restricted patterns of expression, but they share a three exon, two
greater than 200 nucleotides in length that are primarily tran-
intron structure. A detailed sequence analysis of MYC genes sug-
scribed by RNA polymerase II utilising regulatory features simi-
gests that the progenitor of the N-MYC and L-MYC genes was a
lar to protein-coding genes (Derrien et al., 2012), having struc-
duplicated C-MYC gene (Atchley and Fitch, 1995).
tural similarity in which the predominant transcript form con-
sists of two exons, and the suggestion of shared structural motifs
(Bhartiya et al., 2013). In addition, unlike a gene family, there
Repeat sequences III. Gene superfamilies
is a lack of sequence conservation among individual loci, plus
III A. Gene families with low sequence identity but function- they serve a variety of functions. The GENCODE Consortium,
ally conserved domains or short motifs. DNA superfamilies the reference for human genome annotation for the ENCODE
are represented by genetic loci derived from a common ances- Project (www.gencodegenes.org), has provided for the identi-
tral gene that have diverged to the extent in which the proteins fication of 9640 lcnRNA loci dispersed throughout the human
encoded for may be structurally related but ‘develop’ nonover- genome (Harrow et al., 2012). Development of an lncRNome
lapping functions, with members having different patterns of (Bhartiya et al., 2013) has yielded 18 000 lcnRNA annotated tran-
expression (see also: Gene Families: Multigene Families and scripts that include intergenic noncoding RNA, antisense RNA
Superfamilies). However, characteristically, the members of a and processed pseudogenes, with differential patterns of expres-
gene superfamily have conserved protein domains or motifs. A sion. The lncRNA represents 68% of the human transcriptome
notable example is the large genomically dispersed homeobox (Iyer et al., 2015). Biological processes of lncRNA include chro-
superfamily. The protein products are associated with embryonic matin organisation and remodelling associated with gene regula-
development and characterised by a conserved homeodomain tion. Examples include the XIST gene in X-inactivation and the

Volume 2, Issue 1, April 2021 eLS © 2021, John Wiley & Sons, Ltd. www.els.net 66
Genome Organisation: Human

Gag Pol Env

Retrovirus

P ORF1 ORF2 -AN

Non-LTR retrotransposon (L1)

A B 31 -AN

Dimeric SINE (Alu)

IR Transposase IR

Transposon

Figure 2 Transposable elements in the human genome. Each contains the characteristic flanking direct repeats (red arrows). The human endogenous
retrovirus (or LTR retrotransposon) contains long terminal repeats (LTRs) (pale green regions), gag (group-specific antigen gene), pol (polymerase gene) and
possibly env (envelope gene). The non-LTR LINE (L1) retrotransposon contains internal RNA polymerase II promoter sequences (P), two open reading frames
and a poly(A) tail. The SINE retrotransposon (Alu element) has a dimeric structure of homologous halves separated by a middle A-rich region (blue), with
the left half containing A- and B-box RNA polymerase III promoter sequences, and the right half containing an additional internal 31 bp. For L1 and Alu
elements, pale green and mauve regions are sequences unique to these elements.

H19 gene involvement in embryogenesis. In addition, roughly copies and are predominately found in chromosomal G bands.
! 20% of lncRNA transcripts contain sites for miRNA binding. A full-length LINE is approximately 6.1 kbp, although most
are truncated retropseudogenes with various 5′ ends due to
incomplete reverse transcription. There are only an estimated
Repeat Sequences IV. Transposable Elements 80–100 L1 loci considered retrotranspositionally competent; and
Throughout the human genome are interspersed repeat sequences referred to as source or master genes (Brouha et al., 2003),
that have largely amplified in copy number by movement (jump- with allelic variations associated with retrotransposition capa-
ing) into new genomic locations. These sequences referred to bility (Seleme et al., 2006). Source loci represent a subgroup
collectively as TEs include class I elements (retrotransposons or of full-length L1 elements that contain two intact open reading
retroposons) that mobilise via an RNA intermediate, and class II frames (ORFs), one encoding for a protein (ORF1p) involved in
elements (transposons) (Figure 2) that are strictly DNA medi- L1 ribonucleoprotein (RNP) formation and the other (ORF2p)
ated. Both TE classes contain flanking direct repeats (FDRs) as encoding both an endonuclease and reverse transcriptase (Figure
a result of the integration event. Although the human genome 2). Individual LINEs contain an internal RNA polII promoter, 5′
contains roughly 300 000 class II DNA transposons (e.g. Char- and 3′ untranslated regions and a poly (A) tail each contributing
lie, Mariner and Tigger), there is a lack of evidence that these to autonomous mobilisation in both germinal and somatic tissues.
have been active in the human genome, with the youngest fam- The Alu element is estimated at over one million copies in
ilies (MER75 and MER85) most recently active in an ancestral the human genome representing the primary SINE family (Lan-
species of humans and new world monkeys (Lander et al., 2001). der et al., 2001). Sequence comparisons suggest that Alu repeats
Short and long interspersed DNA elements (SINEs and LINEs, were derived from the 7SL RNA gene. Each Alu element is about
respectively) are the primary TE families of the human genome. 280 bp, has a rare dimeric structure, contains internal RNA poly-
These lack the long terminal repeats (LTRs) of endogenous merase III promoter sequences and typically includes an A-rich
retroviral sequences (ERVs) and are, therefore, referred to tail (Figure 2). Approximately 5000 Alu elements have inte-
as non-LTR retrotransposons. LINEs are autonomous retro- grated within the human genome subsequent to the divergence
transposons encoding proteins necessary for retrotransposition of humans from the great apes. About 25% of the more recent
through a process termed target primed reverse transcrip- Alu integrations have yielded presence/absence insertion poly-
tion (TPRT) (Luan and Eickbush, 1996); whereas SINEs are morphisms that are useful as DNA markers for the study of
nonautonomous and have been experimentally demonstrated to forensics and human population genetics (Xing et al., 2007).
mobilise using available LINE machinery (Moran et al., 1999; Alu elements predominate in chromosomal R bands and pref-
Dewannieux et al., 2003). erentially insert into A-rich sequences including the A-tails of
L1 elements represent the primary LINE family in mam- previous Alu integrations. Of the one million plus Alu elements
malian genomes, with humans estimated to contain over 500 000 in the human genome, only 143 loci have been deemed putatively

Volume 2, Issue 1, April 2021 eLS © 2021, John Wiley & Sons, Ltd. www.els.net 67
Genome Organisation: Human

active source genes (Cordaux et al., 2006). These contain diag- addition to FDRs, tRNA retrocopies include the posttranscrip-
nostic nucleotides which account for the generation of different tionally added CCA sequences at the 3′ end. Overall, there are
subfamilies of Alu elements. A recent large scale pedigree study roughly 15 000 pseudogenes, of which approximately 11 000 are
based estimates of new mobile element integrations in humans to processed retrocopies (www.gencodegenes.org).
occur at rates of 1/20 to 1/200 births (Feusier et al., 2019). Retropseudogenes continue to shape the human genome. Anal-
Direct integration events of SINEs and LINEs into genes yses of the 1000 Genomes Project by different laboratory groups
have contributed to human disorders (reviewed in Deininger and have identified 58 new retrocopy sequences in relation to the
Batzer, 1999; Kass, 2001), but also post-integration of several human reference sequence, (Richardson et al., 2014). In addition,
elements within a gene have disrupted genes by serving as sites among over 200 high mobility group (Hmg) nonhistone chromo-
for unequal homologous recombination. For example, within the somal protein retrotropseudogenes in the human genome, there is
low-density lipoprotein receptor (LDLR) gene alone there have evidence for recent mobilisation based on sequence similarity to
been several alterations attributed to recombination between var- cDNA as well as the identification of human-specific integrations
ious Alu elements resulting in familial hypercholesterolemia. In (Tecle et al., 2006). The Hmg retrocopies exhibit traits suggest-
addition, SINEs and LINEs contribute to the dynamics of the ing mobilisation via L1 machinery, including A-tails, FDRs and
genome by serving as DNA methylation sites that alter the regu- notably the preferential TT/AAAA consensus sequence of L1
lation of adjacent genes. integration sites. Furthermore, it has been suggested that mobil-
The human genome also contains families of retroviral-related isation of cellular mRNA by L1 elements has generated an esti-
sequences. These elements are characterised by sequences encod- mated 8000–17 000 retrocopies (Richardson et al., 2014) provid-
ing enzymes for retrotransposition and contain LTRs (Figure 2). ing notable contributions to the dynamics of the human genome.
However, most of these sequences are defunct truncated and/or Should fortuitous landings of retrocopies occur into sites that
mutated retrovirus-like elements that may or may not contain allow for the copy to be expressed, then these would be referred
an env gene. Human endogenous retrovirus (HERV) loci are to as retrogenes.
found predominantly in intergenic regions, though high densities
have been identified in genetic regions undergoing high levels of
recombination, such as with the human class I and class II major Mitochondrial Genome
histocompatibility complexes (Jern and Coffin, 2008). In addi-
tion, solitary LTRs of these elements may be located throughout The mitochondrion contains an autonomously replicating
genome. The human mitochondrial genome is circular and
!
the genome. There are several low abundant (10–1000 copies)
HERV families, with individual elements ranging from 6 to 10 kb. contains 16 569 bp encoding for 37 genes, including tRNA and
In addition, sequences within HERVs as well as HERV-derived rRNA genes used for mitochondrial protein synthesis. Mitochon-
proteins have been associated with serving a role in human devel- drial deoxyribonucleic acid (mtDNA) is maternally inherited
opment but also contributing to various ailments (Kazazian and and generally there are thousands of copies of mtDNA in the
cytoplasm of a cell.
Moran, 2017). Overall, LTR elements encompass approximately
8% of the genome (Lander et al., 2001).
Genome Dynamics and Evolution
V. Pseudogenes and retropseudogenes
The considerable variations of genomes among different organ-
Pseudogenes are nonfunctional copies of a gene containing all isms as well as just between individual humans are indicative of
or part of the original sequence. Pseudogenes may arise by tan- the highly complex and dynamic nature of the genome. How-
dem (segmental) duplication, accumulating mutations as a result ever, comparisons of human and chimpanzee DNA demonstrate
of the lack of selection pressure, and although structurally sim- 98–99% sequence identity. This poses an interesting question as
ilar to the functional gene (e.g. maintaining introns) are usually to what makes us human. By analysis of chromosomes via FISH
recognisable by a lack of an open reading frame. These include and karyograms, it is evident that a shuffling of different segments
the globin pseudogenes. of the genome between humans and chimpanzees has occurred,
Processed pseudogenes, aka retropseudogenes or retrocopies, although there is conservation of synteny of genes, that is sets of
are generated through an RNA intermediate using TPRT, thereby genes that are linked in humans are also linked in chimpanzees.
displaying the characteristic features of lacking introns that However, genomic rearrangements may alter temporal or spatial
were spliced out during processing, are flanked by short tan- expression of genes, as a result of the chromosomal location of
dem repeats and are distant in genomic locations from the source the gene(s), or possibly as a function of a shift in gene imprinting,
gene (Richardson et al., 2014). In addition, they lack regulatory consequently yielding phenotypic variation.
sequences and therefore are normally incapable of expression. TEs have had a considerable role in the construction and
There may be as many as 20–30 retropseudogenes that have organisation of the human genome, as TE sequences repre-
arisen from a parental functional gene, for example ribosomal sent nearly half of the three billion base pairs (Lander et al.,
protein L32, and glyceraldehyde-3-phosphate dehydrogenase, as 2001). Although TE activity has considerably decreased over
well as a multipseudogene family, derived from different mem- time, rare fortuitous retrotransposon integrations have the poten-
bers of the 𝛽-actin gene family (Ng et al., 1985). Processed pseu- tial to become newly established source genes capable of further
dogenes may be derived from noncoding RNA genes as well. In generating retrocopies, which provides an explanation for the

Volume 2, Issue 1, April 2021 eLS © 2021, John Wiley & Sons, Ltd. www.els.net 68
Genome Organisation: Human

concurrent amplification of elements representing distinguish- Karyogram Representation of an entire chromosome set that
able subfamilies of L1 and Alu (Deininger et al., 1992). Retro- has been stained by one of several possible methods yielding
transposon integration can also potentially yield new families of discrete banding patterns.
elements, as exemplified by the SINE-VNTR-Alu (SVA) com- Paralogous genes Genes descended from a common ancestral
posite element, proposed to have been generated by two distinct gene arising as a result of gene duplications.
events (Ono et al., 1987). Presumably, the first step involved Retrotransposon source (master) genes The limited number of
the RNA-mediated transposition of a partial HERV-K element retrotransposon loci capable of generating copies integrated
polyadenylated at the 3′ end becoming a source gene for a new into new chromosomal sites.
retrotransposon family referred to as SINE-R, followed by an Sequence identity The number of characters that match exactly
individual SINE-R integration downstream of a region consist- between two sequences.
ing of CCCTCT hexamer repeats, an Alu homologous region and Sequence similarity Likeness or resemblance between two
a VNTR. Two of six SVA subfamilies are limited to the human sequences.
genome predominantly located in GC-rich regions, with roughly SINEs and LINEs Short and long interspersed repeat elements
one third found to be insertion polymorphisms (Wang et al., 2005) representing a subclass of class I elements that lack LTR
and supporting continued evolution of the human genome result- sequences, and are nonautonomous and autonomous,
ing from more recently derived TEs. respectively, utilising an RNA intermediate to generate
Phenotypic changes may result from the discussed genomic genomic copies.
alterations including genomic rearrangements, TE integrations,
as well as various types of common nucleotide mutations caus-
ing alterations such as in the biochemical nature of the protein References
product or the regulation of expression of the protein product.
In addition, genomic alterations have the potential to generate Atchley WR and Fitch WM (1995) Myc and max: molecular evo-
new proteins/gene families that are composites of differing func- lution of a family of proto-oncogene products and their dimer-
ization partner. Proceedings of the National Academy of Sciences
tional domains as outlined in this review, plus can occur by means
of the United States of America 92 (22): 10217–10221. DOI:
of exon shuffling. One mechanism of exon shuffling is unequal
10.1073/pnas.92.22.10217.
homologous recombination at sites of differing introns, occur- Bhartiya D, Pal K, Ghosh S, et al. (2013) lncRNome: a comprehen-
ring as a result of shared repetitive sequences. Exon shuffling has sive knowledgebase of human long noncoding RNAs. Database:

!
been proposed to be a reason for the existence and maintenance The Journal of Biological Databases and Curation 2013: bat034.
of introns (Gilbert, 1978). More recently there has been evidence DOI: 10.1093/database/bat034.
supporting the process of both 5′ and 3′ transduction mediated Boeke JD and Pickeral OK (1999) Retroshuffling the genomic deck.
by L1 and SVA elements contributing to multiple exon shuffling Nature 398 (6723): 108–111. DOI: 10.1038/18118.
events (Moran et al., 1999; Boeke and Pickeral, 1999; Ostertag Brouha B, Schustak J, Badge RM, et al. (2003) Hot L1s account for
et al., 2003; Szak et al., 2003; Damert et al., 2009; Tica et al., the bulk of retrotransposition in the human population. Proceed-
2016) generating new functional genes (Xing et al., 2006), and ings of the National Academy of Sciences of the United States of
further supporting the contributions of TEs to the dynamics of America 100 (9): 5280–5285. DOI: 10.1073/pnas.0831042100.
genomes and consequently considered accelerators of evolution Cordaux R, Hedges DJ, Herke SW and Batzer MA (2006) Estimating
(Nishihara, 2019). the retrotransposition rate of human Alu elements. Gene 373:
134–137. DOI: 10.1016/j.gene.2006.01.019.
Craig JM and Bickmore WA (1994) The distribution of CpG islands
in mammalian chromosomes. Nature Genetics 7 (3): 376–382.
Acknowledgements DOI: 10.1038/ng0794-376.
Craig JM, Boyle S, Perry P and Bickmore WA (1997) Scaffold
attachments within the human genome. Journal of Cell Science
David H Kass was supported by an Eastern Michigan University
110 (Pt 21): 2673–2682.
Provost Research Support Award and a Faculty Research Fellow-
Damert A, Raiz J, Horn AV, et al. (2009) 5′ -Transducing
ship SVA retrotransposon groups spread efficiently throughout the
Mark A Batzer was supported by National Institutes of Health human genome. Genome Research 19 (11): 1992–2008. DOI:
Grant RO1 GM59290. 10.1101/gr.093435.109.
Deininger PL, Batzer MA, Hutchison CA 3rd and Edgell MH
(1992) Master genes in mammalian repetitive DNA ampli-
Glossary fication. Trends in genetics: TIG 8 (9): 307–311. DOI:
Gene family Set of paralogous genes with related function that 10.1016/0168-9525(92)90262-3.
is identical or very similar. Deininger PL and Batzer MA (1993) Evolution of retroposons. Evo-
lutionary Biology 27: 157–196.
Gene Superfamily Group of genes diverged from a common
Deininger PL and Batzer MA (1999) Alu repeats and human dis-
ancestral gene resulting in nonoverlapping functions but ease. Molecular Genetics and Metabolism 67 (3): 183–193. DOI:
sharing a common domain or motif. 10.1006/mgme.1999.2864.
Genome The total DNA in a cell. Derrien T, Johnson R, Bussotti G, et al. (2012) The GENCODE v7
Homologous genes Related genes sharing common ancestry. catalog of human long noncoding RNAs: analysis of their gene

Volume 2, Issue 1, April 2021 eLS © 2021, John Wiley & Sons, Ltd. www.els.net 69
Genome Organisation: Human

structure, evolution, and expression. Genome Research 22 (9): of initiation by the R2 reverse transcriptase. Molecular and Cellu-
1775–1789. DOI: 10.1101/gr.132159.111. lar Biology 16 (9): 4726–4734. DOI: 10.1128/mcb.16.9.4726.
Dewannieux M, Esnault C and Heidmann T, et al. (2003) McBride OW, Pirtle IL and Pirtle RM (1989) Localization of
LINE-mediated retrotransposition of marked Alu sequences. three DNA segments encompassing tRNA genes to human chro-
Nature Genetics 35 (1): 41–48. DOI: 10.1038/ng1223. mosomes 1, 5, and 16: proposed mechanism and significance
van der Drift P, Chan A, van Roy N, et al. (1994) A multimegabase of tRNA gene dispersion. Genomics 5 (3): 561–573. DOI:
cluster of snRNA and tRNA genes on chromosome 1p36 harbours 10.1016/0888-7543(89)90024-4.
an adenovirus/SV40 hybrid virus integration site. Human Molecu- Moran JV, DeBerardinis RJ and Kazazian HH Jr (1999) Exon shuf-
lar Genetics 3 (12): 2131–2136. DOI: 10.1093/hmg/3.12.2131. fling by L1 retrotransposition. Science 283 (5407): 1530–1534.
Feusier J, Watkins WS, Thoma J, et al. (2019) Pedigree-based esti- DOI: 10.1126/science.283.5407.1530.
mation of human mobile element retrotransposition rates. Genome Ng SY, Gunning P, Eddy R, et al. (1985) Evolution of the func-
Research 29 (10): 1567–1577. DOI: 10.1101/gr.247965.118. tional human beta-actin gene and its multi-pseudogene family:
Gilbert W (1978) Why genes in pieces? Nature 271 (5645): 501. DOI: conservation of noncoding regions and chromosomal dispersion of
10.1038/271501a0. pseudogenes. Molecular and Cellular Biology 5 (10): 2720–2732.
Harrow J, Frankish A, Gonzalez JM, et al. (2012) GEN- DOI: 10.1128/mcb.5.10.2720.
CODE: the reference human genome annotation for The Nishihara H (2019) Transposable elements as genetic accelerators
ENCODE Project. Genome Research 22 (9): 1760–1774. DOI: of evolution:contribution to genome size, gene regulatory network
10.1101/gr.135350.111. rewiring and morphological innovation. Genes & Genetic Systems
Holland PW, Booth HA and Bruford EA (2007) Classification and 94: 269–281. DOI: 10.1266/gggs.19-00029.
nomenclature of all human homeobox genes. BMC Biology 5: 47. Ono M, Kawakami M and Takezawa T (1987) A novel human
DOI: 10.1186/1741-7007-5-47. nonviral retroposon derived from an endogenous retro-
Hsu PW, Huang HD, Hsu SD, et al. (2006) miRNAMap: genomic virus. Nucleic Acids Research 15 (21): 8725–8737. DOI:
maps of microRNA genes and their target genes in mam- 10.1093/nar/15.21.8725.
malian genomes. Nucleic Acids Research 34 (Database issue): Ostertag EM, Goodier JL, Zhang Y and Kazazian HH Jr (2003) SVA
elements are nonautonomous retrotransposons that cause disease in
D135–D139. DOI: 10.1093/nar/gkj135.
humans. American Journal of Human Genetics 73 (6): 1444–1451.
Iben JR and Maraia RJ (2014) tRNA gene copy num-
DOI: 10.1086/380207.
ber variation in humans. Gene 536 (2): 376–384. DOI:
Parker F, Baboolal TG and Peckham M (2020) Actin mutations and
10.1016/j.gene.2013.11.049.
their role in disease. International Journal of Molecular Sciences
Iyer MK, Niknafs YS, Malik R, et al. (2015) The landscape of long
!
21 (9): 3371. DOI: 10.3390/ijms21093371.
noncoding RNAs in the human transcriptome. Nature Genetics 47
Paulson H (2018) Repeat expansion diseases. Hand-
(3): 199–208. DOI: 10.1038/ng.3192.
book of Clinical Neurology 147: 105–123. DOI:
Jeffreys AJ, Wilson V and Thein SL (1985) Individual-specific
10.1016/B978-0-444-63233-3.00009-9.
‘fingerprints’ of human DNA. Nature 316 (6023): 76–79. DOI:
Piriyapongsa J, Mariño-Ramírez L and Jordan IK (2007) Origin
10.1038/316076a0.
and evolution of human microRNAs from transposable elements.
Jern P and Coffin JM (2008) Effects of retroviruses on host
Genetics 176 (2): 1323–1337. DOI: 10.1534/genetics.107.072553.
genome function. Annual Review of Genetics 42: 709–732. DOI: Platt RN II, Vandewege MW and Ray DA (2018) Mammalian trans-
10.1146/annurev.genet.42.110807.091501. posable elements and their impacts on genome evolution. Chro-
Kass DH (2001) Impact of SINEs and LINEs on the mam- mosome Research: An International Journal on the Molecular,
malian genome. Current Genomics 2 (2): 199–219. DOI: Supramolecular and Evolutionary Aspects of Chromosome Biol-
10.2174/1389202013350968. ogy 26 (1–2): 25–43. DOI: 10.1007/s10577-017-9570-z.
Kassiotis G and Stoye JP (2016) Immune responses to endogenous Richardson E, Dorrell RG and Howe CJ (2014) Genome-wide tran-
retroelements: taking the bad with the good. Nature Reviews. script profiling reveals the coevolution of plastid gene sequences
Immunology 16 (4): 207–219. DOI: 10.1038/nri.2016.27. and transcript processing pathways in the fucoxanthin dinoflagel-
Kazazian HH Jr and Moran JV (2017) Mobile DNA in health and late Karlodinium veneficum. Molecular Biology and Evolution 31
disease. The New England Journal of Medicine 377 (4): 361–370. (9): 2376–2386. DOI: 10.1093/molbev/msu189.
DOI: 10.1056/NEJMra1510092. Seleme Mdc, Vetter MR, Cordaux R , et al. (2006) Extensive indi-
Kipreos ET and Pagano M (2000) The F-box protein vidual variation in L1 retrotransposition capability contributes to
family. Genome Biology 1 (5): REVIEWS3002. DOI: human genetic diversity. Proceedings of the National Academy of
10.1186/gb-2000-1-5-reviews3002. Sciences 103 (17): 6611–6616. DOI: 10.1073/pnas.0601324103.
Lander ES, Linton LM, Birren B, et al. (2001) Initial sequencing and Szak ST, Pickeral OK, Landsman D and Boeke JD (2003) Identifying
analysis of the human genome. Nature 409 (6822): 860–921. DOI: related L1 retrotransposons by analyzing 3′ transduced sequences.
10.1038/35057062. Genome Biology 4 (5): R30. DOI: 10.1186/gb-2003-4-5-r30.
Linder P, Lasko PF, Ashburner M, et al. (1989) Birth of the D-E-A-D Tecle E, Zielinski L and Kass DH (2006) Recent integrations of
box. Nature 337 (6203): 121–122. DOI: 10.1038/337121a0. mammalian Hmg retropseudogenes. Journal of Genetics 85 (3):
Lindgren V, Ares M Jr, Weiner AM and Francke U (1985) Human 179–185. DOI: 10.1007/BF02935328.
genes for U2 small nuclear RNA map to a major adenovirus 12 Tica J, Lee E, Untergasser A, et al. (2016) Next-generation
modification site on chromosome 17. Nature 314 (6006): 115–116. sequencing-based detection of germline L1-mediated transduc-
DOI: 10.1038/314115a0. tions. BMC Genomics 17: 342. DOI: 10.1186/s12864-016-2670-x.
Luan DD and Eickbush TH (1996) Downstream 28S gene sequences Tripputi P, Emanuel BS, Croce CM, et al. (1986) Human histone
on the RNA template affect the choice of primer and the accuracy genes map to multiple chromosomes. Proceedings of the National

Volume 2, Issue 1, April 2021 eLS © 2021, John Wiley & Sons, Ltd. www.els.net 70
Genome Organisation: Human

Academy of Sciences of the United States of America 83 (10): Further Reading


3185–3188. DOI: 10.1073/pnas.83.10.3185.
Turner KJ, Vasu V and Griffin DK (2019) Telomere biology and
1000 Genomes Project Consortium (2015) A global reference for
human phenotype. Cell 8 (1): 73. DOI: 10.3390/cells8010073.
human genetic variation. Nature 526: 68–78.
Venter JC, Adams MD, Myers EW, et al. (2001) The sequence
Bella JL, Fernández JL and Gosálvez J (1995) C-banding plus flu-
of the human genome. Science 291 (5507): 1304–1351. DOI:
orochrome staining shows differences in C-, G-, and R-bands
10.1126/science.1058040.
in human and mouse metaphase chromosomes. Genome 38 (5):
Wang H, Xing J, Grover D, et al. (2005) SVA elements: a
864–868. DOI: 10.1139/g95-114.
hominid-specific retroposon family. Journal of Molecular Biology
Graur D and Li W-H (2000) Fundamentals of Molecular Evolution.
354 (4): 994–1007. DOI: 10.1016/j.jmb.2005.09.085.
Sunderland: Sinauer Associates.
Wang KC and Chang HY (2011) Molecular mechanisms of
Hayward A (2017) Origin of the retroviruses: when, where,
long noncoding RNAs. Molecular Cell 43 (6): 904–914. DOI:
and how? Current Opinion in Virology 25: 23–27. DOI:
10.1016/j.molcel.2011.08.018.
10.1016/j.coviro.2017.06.006.
Xing J, Wang H, Belancio VP, et al. (2006) Emergence of
Krebs JE, Goldstein ES and Kilpatrick ST (2018) Lewin’s Genes XII.
primate genes by retrotransposon-mediated sequence transduc-
Jones and Bartlett Learning.
tion. Proceedings of the National Academy of Sciences of
Ponting CP and Hardison RC (2011) What fraction of the human
the United States of America 103 (47): 17608–17613. DOI:
genome is functional? Genome Research 21 (11): 1769–1776.
10.1073/pnas.0603224103.
DOI: 10.1101/gr.116814.110.
Xing J, Witherspoon DJ, Ray DA, Batzer MA and Jorde LB (2007)
Strachan T and Read AP (2018) Human Molecular Genetics, 5th edn.
Mobile DNA elements in primate and human evolution. Amer-
CRC Press.
ican Journal of Physical Anthropology (Suppl 45): 2–19. DOI:
10.1002/ajpa.20722.

Volume 2, Issue 1, April 2021 eLS © 2021, John Wiley & Sons, Ltd. www.els.net 71

You might also like