Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 116

Genome Evolution

Genome Evolution
Evolution of genome structure

Evolution of genome content


.

Genome Evolution
Evolution of genome structure
Gene order changes

Genome Evolution
Evolution of genome structure
Gene order changes
Recombination, Amplification, Deletion, Inversions,
translocations

Genome Evolution
Evolution of genome structure
Gene order changes
Recombination, Amplification, Deletion, Inversions,
translocations

Evolution of genome content


Gene gain (sequence divergence, duplication,
horizontal transfer)

Genome Evolution
Evolution of genome structure
Gene order changes
Recombination, Amplification, Deletion, Inversions,
translocations

Evolution of genome content


Gene gain (sequence divergence, duplication, horizontal
transfer)
Gene loss (deletion)

pSym

Megaplasmid

chromosome

.ThethreerepliconsinR.spNGR234
ThefragmentsinbluearrowindicatetheduplicatedsequenceNGRRS1

Rb
Ra Rb
Rb

Fa

BBB
Fb
Fb

Wild type

AB
AB
Rb

Fa

Ra

Fb

Recombinant type

Cointegration of two replicons mediated by repeat sequencess

Ra

Rb
Ra Rb
Rb

Fa
Fa

BB
Fb
Fb

AB
AB
Rb
Rb

Fa

Fa

Ra
Ra

Fb
Fb

a, ISRm11;
b, nodPQ ;
c, algI

I
3
2

A
6

Schematic representation of
replicon coimtegration and
.oligos used for checking

B
5

II

Potential cointegration sites between replicons

Direct repeats
nifHDK2

NGRIS4a

nifHDK1

NGRIS3c
pNGR234a

NGRIS5c
NGRIS2b

NGRIS5a
NGRIS4b
NGRIS2a

NGRIS3b
NGRIS5b

nifHDK2

NGRIS4a

nifHDK1

NGRIS3c
pNGR234a

NGRIS5c
NGRIS2b

NGRIS5a
NGRIS4b
NGRIS2a

NGRIS3b
NGRIS5b

Ra

Rb
Ra Rb
Rb

Fa
Fa

BB
Fb
Fb

AB
AB
Rb
Rb

Fa

Fa

Ra
Ra

Fb
Fb

Ra

Rb
Ra Rb
Rb

Fa
Fa

BB

AB
AB

Fb
Fb

Rb
Rb

Fa

Fa

Ra
Ra

Fb
Fb

A1

A2

A4/A3

A3/A4
g

deletion
A1

A2

A3

A4

amplification
A1

A2

A3

A4/A3
e
d

Two direct repeats could result in two consequences

A4

? Inverted sequences

Organizacin gnica

Chloroplast genome
. a, Circular, mini-circular or linear
.b. Composed of three parts: LSC (Large Single Copy), SSC, IR
LSC

IR

SSC

Phaseolus

Evolution by transposition

:Transposable elements
General features of transposable elements
Prokaryote transposable elements
Eukaryote transposable elements

Transposable element: genetic elements of a chromosome that have the capacity to


.mobilize and move from one location to another in the genome
.Normal and ubiquitous components of prokaryote and eukaryote genomes. 1
Nonhomologous recombination: transposable elements insert into DNA that has no. 2
.sequence homology with the transposon
Transposable elements cause genetic changes and make important contributions to. 3
:the evolution of genomes
.Insert into genes
.Insert into regulatory sequences; modify gene expression
.Produce chromosomal mutations

:Transposable elements
:Two classes of transposable elements/mechanisms of movement
Encode proteins that (1) move DNA directly to a new position or
(2) replicate DNA and integrate replicated DNA elsewhere
.(prokaryotes and eukaryotes)
Retrotransposons encode reverse transcriptase and make DNA
copies of RNA transcripts; DNA copies integrate at new sites
.(eukaryotes only)

DNA-mediated
transposition
Mobile element encodes transposase, e.g., P element in
Drosophila melanogaster

by transposase
Conservative

Replicative

RNA-mediated transposition
mobile retroelement encodes reverse transcriptase-

Dramatic increase in
copy number

eg SINES, LINES
short & long interspersed repetitive elements
in human genome, Alu repeats derived from 7SL RNA gene also tRNA-derived (MIR repeats)-

Transposable elements in prokaryotes


:Two examples
Insertion sequence (IS) elements
Transposons (Tn)

:Insertion sequence (IS) elements


Simplest type of transposable element found in bacterial chromosomes and
.plasmids
.Encode only genes for mobilization and insertion
.Range in size from 768 bp to 5 kb
IS1 first identified in E. colis glactose operon is 768 bp long and is present with
. 4-19 copies in the E. coli chromosome
.Ends of all known IS elements show inverted terminal repeats (ITRs)

Partial
repeat

:Insertion sequence (IS) elements


:Integration of an IS element may
.Disrupt coding sequences or regulatory regions
.Alter expression of nearby genes
.Cause amplification, deletions and inversions in adjacent DNA
.Result in crossing-over

.Integration of IS element in chromosomal DNA

:Transposons (Tn)
Similar to IS elements but are more complex structurally and carry
additional genes
:types of transposons 2
Composite transposons
Noncomposite transposons

:Composite transposons (Tn)


Carry genes (e.g., a gene for antibiotic resistance) flanked on both sides by IS
.elements
Tn10 is 9.3 kb and includes 6.5 kb of central DNA (includes a gene for
.tetracycline resistance) and 1.4 kb inverted IS elements
. IS elements supply transposase and ITR recognition signals

:Noncomposite transposons (Tn)


Carry genes (e.g., a gene for antibiotic resistance) but do not terminate with
.IS elements
.Ends are non-IS element repeated sequences
Tn3 is 5 kb with 38-bp ITRs and includes 3 genes; bla (-lactamase), tnpA
.(transposase), and tnpB (resolvase, which functions in recombination)

:Models of transposition
.Similar to that of IS elements; duplication at target sites occurs
May be replicative (duplication) or non-replicative (transposon lost
.from original site)
Result in same types of mutations as IS elements: insertions,
.deletions, changes in gene expression, or duplication

LINE transposition

LINE (~20% of mammalian genome, originally trasncribed by Pol II)


SINE (~10-15% of mammalian genome, originally transcribed by Pol III, e.g., Alu repeats)

Evolutionary consequence of transposition


Increase in genome size. 1
Promotes major DNA rearrangements. 2
may affect gene structure or expression
Increased mutation rate may improve survival under. 3
? adverse conditions
eg. antibiotic resistance genes on TEs in bacteria, genomic
reorganization events in plants under environmental stress

Homology, heterology, analogy, orthology,


, paralogy, ohnology, xenology

homolog, ortholog, paralog, analog, Inparalog,


. outparalog, co-ortholog, ohnolog, xenolog

Orthology
Homologous sequences are orthologous if they were separated by a
speciation event: when a species diverges into two separate
species, the divergent copies of a single gene in the resulting
.species are said to be orthologous
Ortologa
Secuencias homlogas son ortlogos si estuvieran separadas por
un caso de especiacin: Cuando una especie difiere en dos
especies, la divergencia de copias de un solo gen en la especie
.resultante se dice que son ortlogos

Ortholog
Orthologs are genes in different species that evolved from a
common ancestral gene by speciation. Normally, orthologs retain
the same function in the course of evolution. Identification of
orthologs is critical for reliable prediction of gene function in
.newly sequenced genomes

Ortholog
Orthologs son los genes en diferentes especies que evolucionaron
de un gen ancestral comn por especiacin. Normalmente,
orthologs mantener la misma funcin en el curso de la evolucin.
Identificacin de orthologs fiable es crucial para la prediccin de
.genes en funcin de los nuevos genomas secuenciados

Paralogy
Homologous sequences are paralogous if they were separated by a
gene duplication event: if a gene in an organism is duplicated to
occupy two different positions in the same genome, then the two
.copies are paralogous
Secuencias homlogas son paralogous si estuvieran separadas por
un caso de la duplicacin de genes: si un gen en un organismo se
duplica para ocupar dos posiciones diferentes en el mismo genoma,
.entonces las dos copias son paralogous
Paralog
Paralogs are genes related by duplication within a genome. Orthologs
retain the same function in the course of evolution, whereas paralogs
. evolve new functions, even if these are related to the original one
Paralogs son la duplicacin de genes relacionados dentro de un
genoma. Orthologs mantener la misma funcin en el curso de la
evolucin, mientras que paralogs evolucionar nuevas funciones,
.incluso si estos estn relacionados con el original

Insufficient information is available to accurately


determine the timing of many of the speciation events that
gave rise to the contemporary genomes. So, whether two
contemporary proteins are orthologs or paralogs cannot be
.determined with certainty
La falta de informacin disponible para determinar con
exactitud la fecha de la especiacin muchos de los
acontecimientos que dieron lugar a la pizarra
contempornea de los genomas. As que, si dos protenas
son contemporneas orthologs o paralogs no puede
.determinarse con certeza
ortholog and paralog were originally proposed by
Walter Fitch, Fitch WM: Distinguishing homologous from
.analogous proteins. Syst Zool 1970, 19:99-113
Ortholog" y "paralog" fueron originalmente propuestos"
por Walter Fitch

:Homologs
Genes that share an arbitrary threshold level of similarity
determined by alignment of matching bases are termed
. homologous
Homologous sequences are termed homologs and this term can be
applied to both genes and proteins. Homologs look similar to each
other and appear to share common ancestry but they may or may
not display the same activity. {Homologs have common origins
but may or may not have common activity.}
Homology is a qualitative term that describes a relationship
. between genes and is based upon the quantitative similarity
Similarity is a quantitative term that defines the degree of
. sequence match between two compared sequences
?Percent homology

:Homologs
Genes that share an arbitrary threshold level of similarity
determined by alignment of matching bases are termed
. homologous
Homologous sequences are termed homologs and this term can be
applied to both genes and proteins. Homologs look similar to each
other and appear to share common ancestry but they may or may
not display the same activity. {Homologs have common origins
but may or may not have common activity.}
Homology is a qualitative term that describes a relationship
. between genes and is based upon the quantitative similarity
Similarity is a quantitative term that defines the degree of
. sequence match between two compared sequences
?Percent homology

?Percent homology

The phrase "pecent homology" is sometimes used but is


incorrect. "Percent identity" or "percent similarity" should be
used to quantify the similarity between the biomolecule
. sequences

:Homlogos
Los genes que comparten un nivel de umbral arbitrario de similitud
determinada por la alineacin de bases coincidentes se denominan
homlogos.
Homloga secuencias se denominan homlogos y que este trmino
puede ser aplicado tanto a los genes y las protenas. Homologs
parecerse a los dems y parecen compartir la ascendencia comn,
pero que puede o no mostrar la misma actividad. (Homologs tienen
orgenes comunes, pero pueden o no tener actividad comn.)
Homologa es un trmino cualitativo que describe una relacin entre
los genes y que se basa en la similitud cuantitativa.
Similitud cuantitativa es un trmino que define el grado de
correspondencia entre la secuencia de dos compararon las
.secuencias
La frase "por ciento de homologa" a veces se utiliza, pero no es
correcto. "Porcentaje de identidad" o "similitud por ciento" se utiliza
.para cuantificar la similitud entre las secuencias de biomolecule

Analogs. Genes or proteins that display the same activity but lack
sufficient similarity to imply common origin are said to have
analogous activity. The implication is that analogous proteins
followed evolutionary pathways from different origins to converge
upon the same activity. Thus, analogous genes or proteins are
considered a product of convergent evolution. Analogs have
homologous activity but heterologous origins. {Analogs have
common activity but not common origin.}

Anlogos. (Anlogos tienen actividad comn, pero no un origen


comn.) Los genes o protenas que se muestre la misma actividad,
pero carecen de suficiente similitud implicar origen comn se dice
que la actividad anloga. La consecuencia es que las protenas
anlogas seguido caminos evolutivos de diferentes orgenes para
converger a la Misma actividad. Por lo tanto, los genes o protenas
similares se consideran un producto de la evolucin convergente.
.Anlogos han homloga actividad, pero heterloga orgenes

Ohnology
Ohnologous genes are para logous genes that have originated by a
process of whole-genome duplication. The name was first given in
honour of Susumu Ohno by Ken Wolfe. Ohnologs are interesting
for evolutionary analysis because they all have been diverging for
.the same length of time since their common origin
Ohnologous genes son paralogous genes que se han originado por
un proceso de duplicacin de todo el genoma. El nombre fue dado
en honor de Susumu Ohno por Ken Wolfe. Ohnologs son
interesantes para el anlisis evolutivo, ya que todos han sido
divergentes durante el mismo lapso de tiempo transcurrido desde
.su origen comn

Xenology
Homologs resulting from horizibtal gene transfer between two
organisms are termed xenologs. Xenologs can have different
functions, if the new environment is vastly different for the
horizontally moving gene. In general, xenologs typically have
.similar function in both organisms
Homlogos resultantes de la transferencia horizontal de genes
entre dos organismos se denominan xenologs. Xenologs puede
tener diferentes funciones, si el nuevo entorno es muy diferente
para el movimiento horizontal de genes. En general, sin
embargo, suelen tener xenologs funcin similar en ambos
.organismos

Heterologs. Genes that are "unique" in activity and sequence are


said to be heterologous. {Heterologs differ in both origin and
activity.}

Heterologs. Los genes que son "nicas" en la actividad y la


secuencia se dice que estn heterloga. (Heterologs difieren en
tanto de origen como de la actividad.)

Homology, heterology, analogy, orthology,


, paralogy, ohnology, xenology

homolog, ortholog, paralog, analog, Inparalog,


. outparalog, co-ortholog, ohnolog, xenolog

For most purposes three terms may be found both necessary


and sufficient to classify genes or proteins in order to address
questions beyond the evolutionary pathway(s) to the
: contemporary sequence. The three terms are
heterologous
homologous
analogous

These three states require low-level knowledge from similarity


measurements to classify sequences, and provide sufficient
.information to use in most of our applications
Some applications may benefit from further subclassifications,
but it would be advisable to weigh the benefits derived vs the
time required for the intended application. Although terms
such as paralogous and orthologous are used with apparent
confidence to describe sets of genes, rarely is it possible to
know whether genes with similarity of sequence and different
activity resulted from duplication and divergence within that
organism/species, or arose by recombination, or evolved to
activity elsewhere and moved into the current space by
.horizontal transfer, etc
A word of caution against routine dithering of sequence into
sub term is that more knowledge is required for accurate
placement, and inaccurate placements can eventually lead to

En la mayora de los tres terminologas puede encontrar la vez necesaria y suficiente


para clasificar los genes o protenas con el fin de abordar cuestiones ms all de la
:va evolutiva (s) a la secuencia contemporneo. Los tres terminologas son
Heterloga
Homloga
Anloga
Estos tres estados requieren de bajo nivel de conocimientos de las mediciones de
similitud de secuencias de clasificar, y proporcionar informacin suficiente para su
uso en la mayora de nuestras aplicaciones. Algunas aplicaciones pueden
beneficiarse de subclassifications ms, pero sera conveniente sopesar los beneficios
derivados vs el tiempo necesario para La aplicacin prevista. Aunque trminos como
parologous ortlogos y se utilizan con aparente confianza para describir conjuntos de
genes, casi nunca es posible saber si los genes con similitud de la secuencia y el
resultado de diferentes actividades de duplicacin y divergencia dentro de ese
organismo o especie, o surgen por recombinacin, o Evolucionado a la actividad en
otro lugar y se traslad a la actual espacio de la transferencia horizontal, etc. Una
palabra de advertencia contra las tergiversaciones de la secuencia de rutina en
terminologa ms es que se requieren conocimientos precisos para la colocacin, y la
colocacin incorrecta puede conducir a la confusin de los principales sobre el
.terreno

HA1
HA2
HA3

Gene
A

WA1
WA2
Speciation worm-human

human
worm

fungal

yeast
HA1
HA2
HA3

WA1
animal

human
worm

WA2

HB

human

WB

worm

Speciation worm-human
Duplication in animal ancestor to A and B forms
Speciation fungi-animals

Inparalogs, and outparalogs

Inparalog: Paralogs in a given lineage that all evoloved by gene


duplications that happened after the radiation (speciation) event that
seperated the given lineage from the other lineage under
.consideration
Inparalog: Paralogs en un determinado linaje que todos extinga por
duplicaciones de genes que ocurri despus de la radiacin
(especiacin) caso de que dado el linaje separado de los otros
.linajes que se examina
Outparalogs: paralogs in the given lineage that evolved by gene
.duplications that happened before the radiation (speciation event)
Outparalogs: paralogs dado linaje en el que se desarroll por las
duplicaciones de genes que sucedi antes de la radiacin (caso de la
.especiacin)

species
genes

specie
s
genes
?

specie
s
genes
outparalog

outparalog
inparalog
ortholog
ortholog

Orthologs are genes (proteins) with the same fuction in different


?organisms

?Paralogs are homologs within one organism

Evolution by Horizontal Transfer

1. What is horizontal transfer?

2. Detecting horizontal transfer in genomic sequences

The solution: Horizontal Gene Transfer (HGT)

HGT possesses two ingredients sure to cause a controversy


1. Challenges the traditional tree-based view of evolution
2. Is difficult to prove unambiguously

Xenologs arise by horizontal transfer


organisms
Ancestral gene

Paralogs

Speciation

time

Orthologs

Duplication

Xenologs

Horizontal Transfer
Xenologs

Xenologs homologs related by horizontal transfer


Orthologs homologs related by speciation
Paralogs homologs related by duplication

Horizontal transfer by transformation


Community of cells of
species A

DNA from a
Lysed cell

.1
A cell picks up the
DNA and incorporates
into genome

.2
mutation rises
to fixation in the
species B

.3

Community of cells
of species B

Transfers can be either selectively driven or selectively neut

mutation

fixation

Mechanisms of horizontal transfer (also referred to as lateral transfer)


1) Transformation prokaryotes can take up free DNA from
their surroundings
2) Conjugation (bacterial sex) an organism builds a tube-like
structure known as the pilus, joins it to its mate, and
transfers a plasmid through the tube. E. coli has been shown to
conjugate with cyanobacteria, AND EVEN with S. cerevisiae!

3) Transduction genes can be moved from one prokaryote


species to another via viruses.

Interest of understanding horizontal transfers extends beyond


academic debate:
1. More and more bacterial infections are resistant to many
antiobiotics
2. Bioengineered plants may also exchange genes with natural
species

Detecting Horizontal Transfers


1.
2.
3.
4.
5.

Unexpected ranking of sequence similarity among homologs


Unexpected phylogenetic tree topology
Unusual phyletic pattern
Conservation of gene order
Anomalous DNA composition

All criteria for identifying probable horizontal gene transfer, or


more precisely acquisition of foreign genes by a particular
genome, inevitably rely on some unusual feature(s) of subsets of
genes that distinguishes them from the bulk of genes in the
genome. Koonin et al. 2001

Direct proofs are unavailable in nature


Indications of horizontal transfers remain probabilistic
Koonin et al. ARM (2001) 55 709-42

1. Unexpected ranking of sequence similarity among homologs

Basis for horizontal transfer hypothsis: A gene sequence from


a particular organism shows the strongest similarity to a
homolog from a distant taxon.

Unexpected phylogenetic tree topology. 2

If, for example, in a well-supported


tree, a bacterial protein groups with its
archael homologs to the exclusion of
homologs from other bacteria and, best
of all, shows a reliable affinity with a
particular archael lineage, the
conclusion that horizontal gene
transfer is at play seems inevitable.

archae
eubacteria
Mn-dependent transcriptional regulator

(Tatusov, 1996)

The signature of horizontal transfer is


incongruence between the species tree and the gene tree

Species Tree
uure
mgen
mpneu
bsub
bhal
ctra
cpneu
tpal
bbur

Gene Tree
uure
mpneu
bsub
bhal
ctra
cpneu
mgen
tpal
bbur

Horizontal transfer event

Phylogenetic tree for glutamine synthetase indicates


horizontal transfer between the three domains.

Species of archaea and bacteria are mixed


and, therefore, are not monophyletic.

The eukaryotes are monophyletic however,


the Bacteria, and not the Archaea, is its
closest outgroup.

Several bacterial species


have both a bacterial GSI
isoform and a eukaryotic Bacteria
GSII isoform.
Archaea
Eukaryote

Brown, J.B. (2003) NRG 4 121-132

3. Unusual phyletic pattern (the pattern of species )

uure
mgen
mpneu
bsub
bhal
ctra
cpneu
tpal
bbur

The phylogenetic
pattern

One gene present

Independent origin? (for SNPs, fusion events)


Gene loss?
Horizontal transfer?

3. Unusual phyletic pattern


Horizontal transfer

OR

Lineage-specific gene loss

uure
mgen
mpneu
bsub
bhal
ctra
cpneu
tpal
bbur

Origin of gene

uure
mgen
mpneu
bsub
bhal
ctra
cpneu
tpal
bbur

Gene loss

Here, horizontal transfer is the most parsimonious explanation


because only one transfer event needs to be hypothesized while two
gene loss events need to be assumed in the gene loss model.

4. Conservation of gene order


Gene order is not generally conserved in microbial genomes
E. coli
B. subtilis
V. cholerae

The presence of three or more genes in the same order in distant


genomes is extremely unlikely unless these genes form an operon.

Each operon typically emerges only once during evolution and is


maintained by selection ever after.

Therefore, when an operon is present in only a few distantly related


genomes, horizontal gene transfer seems to be the most likely
scenario.

Genes with conserved gene order tend to be functionally related

Tryptophan biosynthesis operon across genomes


Huynen et.al, Curr. Opin. Struct. Biol. June 2000

Why should genes of common function


cluster along the chromosome?

Selfish Operon:
gene clusters allow dissemination of functionally
related genes via horizontal gene transfer

The Selfish Operon Model postulates that the organization of


bacterial genes into operons is beneficial to the constituent
genes in that proximity allows horizontal cotransfer of all genes
required for a selectable phenotype. Horizontal transfer of selfish
operons most probably promotes bacterial diversification.
Lawrence, 1999, COGD 9: 642-648

Restriction/Modification system as an example of Selfish Operons


R Restriction endonuclease enzyme (chops DNA)
M Methyl-transferase enzyme (protects host DNA)
R and M are always found adjacently
Are known to undergo horizontal transfer
Two strains of Helicobacter pylori and the putative ancestral locus

Probably HT

Probably HT

Lim et al. PNAS (2001) 98 2740-2745

5. Anomalous DNA composition


Synonymous codons are expected to be neutral,
are expected to occur in equal frequency

Expect 50/50 frequency for two phenylalanine codons

Codon biases are found in all known prokaryotes

Codon frequencies in E. coli

Mozner I. Current Opinion in Microbiology 1999, 2:524528

Factor Analysis of codon usage of B. subtilis genes


reveals three classes of genes
Class 2 (5%) genes that are
highly expressed under
exponential growth conditions

Kunst, F et al. Nature


(1997) 390 249-256

Class 1
comprises the
majority of the B.
subtilis genes
(82%)

Class 3 (13%) genes


that were apparently
horizontally
transferred. Dif codo
usage.

Because some of the genes in this group showed clear relationships with bacteriophage
genes, the hypothesis has been proposed that all these genes were alien and have been
acquired horizontally from various sources.

Mozner I. Current Opinion in Microbiology 1999, 2:524528

Why do horizontally transferred genes use the genetic code


differently?

Lawrence and Ochman. J Mol Evol (1997) 44:383397

Why do horizontally transferred genes use the genetic code


differently?
Horizontally transferred genes retain
the sequence characteristics of the
donor genome

Lawrence and Ochman. J Mol Evol (1997) 44:383397

Why do horizontally transferred genes use the genetic code


differently?
Horizontally transferred genes retain
the sequence characteristics of the
donor genome

Base composition differences are mostly due to third position of


codons

Lawrence and Ochman. J Mol Evol (1997) 44:383397

Bacterial species display a wide degree of


variation in their overall G+C content

Rocha EP. Trends Genet 2002 Jun;18(6):291-4

However, most genes have roughly the same GC content within a genome

Distribution of A + T-rich islands along the


chromosome of B. subtilis.
Location of genes from class 3 according to codon
usage analysis is indicated by dots at the bottom of
the graph.
Known prophages are indicated by their names, and
prophage-like elements are numbered
from 1 to 7.
Kunst, F et al. Nature (1997) 390 249-256

It is not common that Genes in closely related genomes


tend to have different base composition

Salmonella enterica
(52% G+C)

E. coli
(50% G+C)

A large number of S. enterica genes that are not present


in E. coli (or any other enteric species) have base
compositions that differ significantly from the overall 52%
G+C content of the entire chromosome
Ochman, H. & Lawrence, J. G. (1996) in Escherichia coli and Salmonella typhimurium:
Cellular and Molecular Biology, eds. Neidhardt, et al. 2nd Ed., pp. 26272637.

18% of E. coli genes are foreign

Using base composition and codon usage to identify


HTs
755 (17.6%) of the 4288 ORFs in the genome
originated through horizontal gene transfer.
Transferred in at least 234 events (?)

Lawrence and Ochman. PNAS (1998) 95 9413941

Distribution of horizontally acquired (foreign) DNA


in sequenced bacterial and archael genomes.

Prevalent features of their resident genomes ranges


from virtually none in some organisms with small
genome sizes to ~17% in Synechocystis.
Ochman et al. Nature 2000 405 299-304

Classification of horizontal transfer events


Classified into three categories with respect to the relationships
between the horizontally acquired gene with homologs in the
host genome:
(a) NEW: acquisition of a new gene missing in other members of a
given clade,
1. Loss and regain
2. Non-orthologous gene transfer
(b) ADDITIONAL: acquisition of a paralog of the given gene with a
distinct evolutionary ancestry, and
(c) REPLACEMENT: acquisition of a phylogenetically distant
ortholog followed by xenologous gene displacement, that is,
elimination of the ancestral gene
Koonin et al. ARM (2001) 55 709-42

(a) acquisition of a new gene missing in other members of a given clade,

Horizontal transfer
Is a new gene

1. Loss and regain

An unused gene deleted and replaced by a


homologue when again selected for.

Doolittle et al. (2003) Phil. Trans. R. Soc. Lond. B (2003) 358,

2. Non-orthologous gene replacement

a functional analogue (but not an orthologue) of


an essential gene added, original subsequently
lost.

(b) Additional: acquisition of a homolog of the given gene with a


distinct evolutionary ancestry

Horizontal transfer
and

are orthologs

Even though the two are homologs they will lie at different places
on the tree of the gene family because they have different
evolutionary trajectories.

(c) Ortholog displacement: acquisition of a phylogenetically


distant ortholog followed by xenologous gene displacement, that
is, elimination of the ancestral gene

Horizontal transfer
and

are orthologs

Gene loss

How many and which genes shared by all genomes in


a given group, or indeed by all prokaryotes, have the
same evolutionary history and produce the same
phylogenetic tree?
Genome 1

Genome 2

Genome 3

Do these genes
produce same
tree?

Is there a stable core of genes that do faithfully record


population bifurcations and speciation events, back to the
groups last common ancestor?

The archaeal genomes contain a striking genomescape strongly


suggestive of massive horizontal gene transfer.

Transfer of
operational
genes.

Proteins involved in translation, transcription,


replication and protein secretion are most closely related
to eukaryotic proteins.
Whereas metabolic enzymes, metabolite uptake systems,
enzymes for cell wall biosynthesis and many
uncharacterized proteins appear to be 'bacterial'.
These observations have been tentatively explained by
massive gene exchange between archaea and bacteria
Koonin et al. Mol Microbiol 1997 Aug;25(4):619-37
Koonin et al. ARM (2001) 55 709-42

Escherichia coli strains O157:H7 and K12 are more


different than any two mammals
Pan-genome: 5843
4456

5315
O157:H7
1387
(nasty pathogen)

3928
core-genome

K12
528 (friendly gut
bacterium)

These two genomes differ in 26 or 12% of their


genes (depending on which way you cast the
comparison).
Most of the genes unique to each strain

In many cases, genes acquired by horizontal transfer confer the


species-specific trait

Evolutionary relationships and phenotypic


profiles of representative enteric bacteria.
Ochman et al. Nature (2000) 405 299-304

Amount and source of horizontal gene transfer can be linked


to an organisms lifestyle

1) bacterial hyperthermophiles seem to have exchanged


genes with archaea to a greater extent than other
bacteria,
2) transfer of certain classes of eukaryotic genes is most
common in parasitic and symbiotic bacteria.

Detection of Selection On Genes

Detection of Selection On Genes


Two classes of DNA substitutions
Synonymous (DNA change without amino acid change)
Nonsynonymous (DNA change causing amino acid
change)

Standard Genetic Code


Phe

Leu

Leu

Ile

UUU

Ser

UCU

Tyr

UAU

Cys

UUC

UCC

UUA

UCA

ter

UAA

ter

UGA

UUG

UCG

ter

UAG

Trp

UGG

CCU

His

CAU

Arg

CGU

CUU

Pro

CUC

CCC

CUA

CCA

CUG

CCG

AUU

Thr

ACU

AUC

ACC

AUA

ACA

Met

AUG

ACG

Val

GUU

Ala

GCU

GUC

GCC

GUA

GCA

GUG

GCG

UAC

UGU

Gln

Asn

UGC

CAC

CGC

CAA

CGA

CAG

CGG

AAU

Ser

AAC
Lys

AAA

AGC
Arg

AAG
Asp
Glu

GAU

AGU
AGA
AGG

Gly

GGU

GAC

GGC

GAA

GGA

GAG

GGG

Detection of Selection On Genes


Two classes of DNA substitutions
Synonymous (DNA change without amino acid change)
Nonsynonymous (DNA change causing amino acid
change)

Neutral equal frequencies

Detection of Selection On Genes


Two classes of DNA substitutions
Synonymous (DNA change without amino acid change)
Nonsynonymous (DNA change causing amino acid
change)

Neutral equal frequencies


Conservative selection (negative selection,
purifying selection) fewer nonsynonymous
substitutions

Detection of Selection On Genes


Two classes of DNA substitutions
Synonymous (DNA change without amino acid change)
Nonsynonymous (DNA change causing amino acid change)

Conservative selection (negative selection, purifying


selection) fewer nonsynonymous substitutions
Positive selection (diversifying selection) more
nonsynonymous substitutions
Neutral equal frequencies

Selection
Ka, or dN: non-synonymous substitutions per non.synonymous site
.Ks, or dS: synonymous substitutions per synonymous site
Ka/Ks or : The ratio of these two rates

Average synonymous (Ks) and nonsynonymous (Ka) substitution rates of 75 proteincoding genes in P. vulgaris or G. max plastomes comparing with the reference
.plastomes

Arabidopsis as
reference
Phaseolus
Ka
Glycine
Phaseolus
Ks
Glycine

Lotus
as reference

Medicago
as reference

0.0681

0.0446

0.0563

(0.0392*)

(0.0984)

(0.2240)

0.0588

0.0348

0.0463

(0.0389)

(0.0988)

(0.2096)

0.5657

0.3171

0.3048

(0.0754)

(0.1059)

(0.1790)

0.5221

0.2762

0.2662

(0.0953)

(0.1106)

(0.1602)

.The data in parenthesis indicate the Standard deviations *

Average synonymous (Ks) and nonsynonymous (Ka) substitution rates of 75 protein-coding genes
.in P. vulgaris or G. max plastomes comparing with the reference plastomes

Arabidopsis as
reference
Phaseolus
Ka
Glycine
Phaseolus
Ks
Glycine

Lotus
as reference

Medicago
as reference

0.0681

0.0446

0.0563

(0.0392*)

(0.0984)

(0.2240)

0.0588

0.0348

0.0463

(0.0389)

(0.0988)

(0.2096)

0.5657

0.3171

0.3048

(0.0754)

(0.1059)

(0.1790)

0.5221

0.2762

0.2662

(0.0953)

(0.1106)

(0.1602)

.The data in parenthesis indicate the Standard deviations *

.Result: Phaseolus evolve faster

Neutral equal frequencies (Ka/Ks=1, or dN/dS=1)


Conservative selection (negative selection, purifying
selection) fewer nonsynonymous substitutions (Ka/Ks<1)
Positive selection (diversifying selection) more
nonsynonymous substitutions. (Ka/Ks>1)

The yin and yang of prokaryote evolution

Vertical
transfer

Horizontal
transfer

Woese, C. PNAS (2000) 97 8392-8396

You might also like