Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Virus Research 289 (2020) 198168

Contents lists available at ScienceDirect

Virus Research
journal homepage: www.elsevier.com/locate/virusres

Evidence supporting a viral origin of the eukaryotic nucleus


Philip J.L. Bell
Microbiogen Pty Ltd, Unit E2, Lane Cove Business Park, Lane Cove West, NSW, 2066, Australia

A R T I C L E I N F O A B S T R A C T

Keywords: The defining feature of the eukaryotic cell is the possession of a nucleus that uncouples transcription from
Viral eukaryogenesis translation. According to the updated Viral Eukaryogenesis (VE) hypothesis presented here, the eukaryotic nu­
Nucleus cleus descends from the viral factory of a DNA virus that infected the archaeal ancestor of the eukaryotes. The VE
Eukaryote origin
hypothesis implies that many unique features of the nucleus, including the mechanisms by which the eukaryotic
Viral factory
mRNA capping
nucleus uncouples transcription from translation, should be viral rather than cellular in origin. The modern
Phylogeny eukaryotic nucleus uncouples transcription from translation using a complex process employing hundreds of
eukaryotic specific genes acting in concert. This intricate process is primed by the eukaryote specific 7-methyl­
guanylate (m7G) cap on eukaryotic mRNA that targets mRNA for splicing, nuclear export, and cytoplasmic
translation. It is shown here that homologues of the eukaryotic m7G capping apparatus are present in viruses of
the Mimiviridae yet are apparently absent from archaea generally, and specifically from Lokiarchaeota, a proposed
archaeal relative of the eukaryotes. Phylogenetic analysis of the m7G capping apparatus shows that eukaryotic
nuclei and Mimiviridae obtained this shared pathway from a common ancestral source that predated the origin of
the Last Eukaryotic Common Ancestor (LECA). These results are consistent with the hypothesis that the
eukaryotic nucleus and the Mimiviridae obtained these abilities from an ancient virus that could be considered the
‘First Eukaryotic Nuclear Ancestor’ (FENA).

1. Introduction only appeared ~ 2.2 (+/-0.4) billion years ago (Parfrey et al., 2011; El
Albani et al., 2010; Bengtson et al., 2017), the evidence for the coupled
A membrane-bound nucleus defines the eukaryotic domain, and all prokaryotic system predates the uncoupled eukaryotic system by up to
cellular organisms without nuclei are prokaryotic (Sapp, 2005; Stanier two billion years.
and Van Niel, 1962). In addition to changes introduced by the mito­ Despite the chasm in cellular design between the eukaryotic and the
chondria, the nucleus is a major contributor to the divide between eu­ prokaryotic domains introduced by the nucleus, it has been long
karyotes and prokaryotes since it is associated with features such as established that both ribosomal RNA (Woese et al., 1990) and the in­
linear chromosomes with centromeres and telomeres, nuclear pores, the formation processing machinery of the Eukarya and the Archaea
spliceosome, mitosis, meiosis, the sexual cycle, and the endoplasmic (Ribeiro and Golding, 1998) are more closely related to each than either
reticulum. Understanding the origin of the nucleus is therefore essential are to the Bacteria. Under the long established ‘three domains tree’
to understanding the origin of the eukaryotic cell. paradigm, this pattern occurs because archaea and the eukaryotes are
Since the eukaryotic genome is sequestered inside the nucleus and sister groups to the exclusion of the bacteria (Woese et al., 1990).
functional ribosomes are restricted to the cytoplasm, the presence of the Despite the ‘three-domains tree’ of life being the dominant paradigm for
nucleus introduces a unique uncoupling of transcription and translation over 20 years, evidence is mounting for the alternative ‘two domains
into all cells of the eukaryotic domain. That is, in all eukaryotic cells tree’ where the eukaryotes are not a sister group to the archaea, but
mRNA is synthesised inside the nucleus and must be exported into the rather are nested within the archaeal domain (Rivera and Lake, 1992).
cytoplasm for translation (Kyrieleis et al., 2014). This contrasts to the Although the discovery of Asgard archaea such as Lokiarchaeota may not
coupled prokaryotic transcription and translational systems where close the debate about the universal tree of life topology (Da Cunha
uncapped mRNA is synthesised and translated directly in the cytoplasm et al., 2018), there has been considerable support for the proposal that
(Benelli and Londei, 2011). Since archaeal methanogens appeared at Lokiarchaeota bridges the gap between prokaryotes and eukaryotes
least 3.8 billion years ago (Battistuzzi et al., 2004), whereas eukaryotes (Spang et al., 2015). Consistent with the Eocyte hypothesis (Rivera and

E-mail address: philip.bell@microbiogen.com.

https://doi.org/10.1016/j.virusres.2020.198168
Received 27 January 2020; Received in revised form 10 September 2020; Accepted 14 September 2020
Available online 20 September 2020
0168-1702/© 2020 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
P.J.L. Bell Virus Research 289 (2020) 198168

Fig. 1. The coupled prokaryotic system of tran­


scription and translation. Both archaea and bacte­
ria utilize one type of multi-component RNA
polymerase (RNAP) to transcribe all RNA (Werner,
2007). Transcription and translation in prokaryotes
are coupled since transcription and translation occur
directly in the protoplasm, and thus translation initi­
ation can occur before the mRNA transcript is fully
synthesised. Translation in prokaryotes relies on direct
recognition of mRNA by the ribosomal apparatus via
sequences such as the Shine-Dalgarno sequences or
short UTR’s (Benelli and Londei, 2011). In the case of
Shine-Dalgarno sites, the 30S ribosomal subunit binds
to the mRNA in such a way that AUG codon lies on the
peptidyl (P) site and the second codon lies on amino­
acyl (A) site. The initiator tRNA binds to the P site, the
large ribosomal subunit docks with the small subunit,
the initiation factors are released, and the ribosome is
ready to start translation. Since prokaryotes originated
at least 3.8 billion years ago (Battistuzzi et al., 2004)
the coupled prokaryotic process predates the uncou­
pled eukaryotic system by close to 2 billion years and
thus is the most ancient cellular system of transcription
and translation.

Lake, 1992), Asgard archaea are proposed to be direct descendants of an history, yet no transitional cellular forms linking the prokaryotic (Fig. 1)
archaeal ancestor of the eukaryotes (Koonin, 2015). If the Eocyte school and eukaryotic systems (Fig. 2) have been described.
of thought is accepted, then the last common ancestor of the Asgard Although fossil evidence for viruses is unlikely to be found, viruses
archaea and the eukaryotes was a bona-fide archaeal ‘First Eukaryotic almost certainly evolved in concert with the prokaryotic domains and
Common Ancestor’ (FECA) (Eme et al., 2017) that lacked both a mito­ thus existed before the origin of LECA. Supporting an early origin of
chondrion and a nucleus. viruses is the observation that a prokaryotic genome that is free of ge­
Despite the fundamental differences between the two schools of netic parasites is expected to show signs of genome degeneration due to
thought, both schools propose that archaea and the eukaryotes are rel­ the need for a mechanism to overcome the degradation of prokaryotic
atives to the exclusion of the bacteria. Since the Last Eukaryotic Com­ genomes caused by processes such as Muller’s ratchet (Iranzo et al.,
mon Ancestor (LECA) possessed both a nucleus and a mitochondrion 2016). There are also strong biological arguments that the emergence of
(Neumann et al., 2010) and bona-fide modern archaea possess neither of genetic parasites is inevitable due to the instability of parasite-free states
these organelles, both schools of thought must explain why the nucleus (Koonin et al., 2017). Further experimental support for a pre-LECA
and the mitochondria appear in the in eukaryotic lineage and not in the origin of viruses comes from phylogenomic analysis which shows that
archaeal lineage. Despite nearly 100 years of contention (McInerney modern eukaryotic viruses evolved from pre-existing prokaryotic vi­
et al., 2014), many lines of evidence show that the appearance of the ruses (Koonin et al., 2015). It can thus be anticipated that viruses would
mitochondrion is persuasively explained by its endosymbiotic descent have emerged in concert with the first prokaryotes and existed for much
from an alpha-proteobacterium (e.g. Lang et al., 1999). By contrast the of the ~2 billion years between the appearance of the first archaeal
appearance of a nucleus has been much more difficult to explain (Mar­ methanogens and the appearance of LECA.
tin, 1999, 2005) and remains contentious despite its importance to un­ The discovery of viral factories constructed by Pseudomonas PhiKZ
derstanding the origin of the eukaryotes. jumbophage 201 Φ2− 1 (Chaikeeratisak et al., 2017b) showed that
The change from the coupled system of transcription and translation bacterial viruses can establish a nucleus-like uncoupling of transcription
found modern archaea (Fig. 1) to the uncoupled eukaryotic system from translation. The viral factory established by 201 Φ2− 1 sequesters
found in LECA (Fig. 2) required the evolution of a complex molecular the linear viral DNA within the factory whilst excluding the functional
system involving hundreds of genes acting in concert. The addition of ribosomes (Chaikeeratisak et al., 2017b). As a result, once the factory is
the m7G cap to eukaryotic mRNA is critical to many aspects of this established, mRNA is synthesised within the factory and exported into
complex molecular system since it primes the mRNA for processing, the cytoplasm for translation. Functionally, infection results in the
export, and translation (Fig. 2). Additionally, unlike prokaryotes, eu­ bacterial protoplasm being divided into a viroplasm where viral infor­
karyotes possess three RNA polymerases, of which only RNAP-II is mation processing occurs, and a cytoplasm where translation and
dedicated to synthesising mRNA with an m7G cap (Sentenac, 1985). metabolic enzymes are localised. Since information processing enzymes
There are three enzymatic functions required to add the characteristic encoded by the virus must function inside the viral factory whilst
m7G cap to eukaryotic mRNA. Firstly, an RNA 5′ -phosphatase (TPase) components of the virions are assembled in the cytoplasm, the boundary
hydrolyses the 5′ -triphosphatase end of the nascent mRNA to generate a of 201 Φ2-1 viral factories must be able to selectively sort which pro­
5′ -diphosphate. The 5′ -diphosphate is then capped with guanosine teins, RNA transcripts and other factors can move across the boundary
mono-phosphate by an RNA guanylyltransferase (GTase) to generate a 5′ (Chaikeeratisak et al., 2017b).
GpppRNA cap on the transcript. Finally, the guanosine GpppRNA cap is In addition to uncoupling of transcription from translation, 201 Φ2-1
methylated by RNA (guanine-N7)-methyltransferase (MTase) (Kyrieleis also possesses homologues of eukaryotic tubulin (PhuZ) which positions
et al., 2014). A cap binding protein (eIF4E) is an essential component of the viral factory in the centre of the infected cell (Chaikeeratisak et al.,
the eukaryotic system to uncouple transcription from translation since it 2017a). The PhuZ spindle is the only known example of a cytoskeletal
binds the m7G cap leading to the initiation of translation in the cyto­ structure that shares three key properties with the eukaryotic spindle:
plasm (Marcotrigiano et al., 1997). Paradoxically, the high level of dynamic instability, a bipolar array of filaments, and central positioning
complexity and the integrated nature of the cap-based system of of DNA (Chaikeeratisak et al., 2017a). This is a further parallel with the
uncoupling transcription from translation suggest a long evolutionary eukaryotic nucleus since eukaryotic nuclei are positioned in the cell by

2
P.J.L. Bell Virus Research 289 (2020) 198168

Fig. 2. The eukaryotic system to


uncouple transcription from trans­
lation is complex and employs
hundreds of genes that act in con­
cert. The dominant eukaryotic cap
dependent system of transcription and
translation that was functional in LECA
(Neumann et al., 2010) is described
below. i) Subunits of RNAP-II are
translated in the cytoplasm and im­
ported into the nucleus through the
Nuclear Pore Complex (NPC). RNAP-II
initiates transcription of mRNA by
binding to the promoter regions of
protein coding genes. ii) After the
synthesis of the first 20 to 25 bp of
mRNA, the polymerase pauses until
the mRNA is capped (Ramanathan
et al., 2016; Okamura et al., 2015).
The eukaryotic m7G cap (symbolised
by ★) consists of 7-methylguanosine
linked via a reversed 5′ -5′ triphos­
phate linkage to the transcript and is
the first modification made to RNAP-II
transcribed RNA. Three enzymatic
functions are required to generate the cap. Firstly, a RNA 5′ -phosphatase (TPase) hydrolyses the 5′ -triphosphatase end of the nascent mRNA to generate a
5′ -diphosphate. The 5′ -diphosphate is then capped with guanosine mono-phosphate by an RNA guanylyltransferase (GTase) to generate a 5′ GpppRNA cap on the
transcript. Finally, the guanosine GpppRNA cap is methylated by RNA (guanine-N7)-methyltransferase (MTase) (Kyrieleis et al., 2014). iii) The nuclear cap binding
complex (CBC) binds to the m7G cap which then forms a complex with snRNP’s to initiate splicing and polyadenylation (Ramanathan et al., 2016). Splicing of mRNA
transcripts is unique to the eukaryotes and requires interaction of hundreds of proteins and the conserved snRNAs. iv) The m7G cap primes the mRNA for transport
through the nuclear pores into the cytoplasm (Katahira, 2015) by binding trans-acting factors to form a mature messenger ribonucleoprotein (mRNP). Recruitment of
the multi-subunit TRanscription-EXport (TREX-1) complex requires the 5′ capping of pre-mRNA because CBP80 interacts with the QAlyRef and THO sub-complexes
of TREX-1 (Okamura et al., 2015). v) The nuclear pore complex (NPC) is integral to the uncoupling of transcription from translation because the NPC acts as a gate
keeper, controlling which macromolecules enter and exit the nucleus. NPC’s are unique to the eukaryotes, and a single NPC comprises ~500 individual protein
molecules collectively known as nucleoporins (Nups) (Kabachinski and Schwartz, 2015). The NPC includes a nuclear ring, a central transport channel and eight
cytoplasmic fibrils which allow molecules smaller that 40-60 kDa to freely diffuse (Kabachinski and Schwartz, 2015). Large molecules such as mRNA must associate
with specific export receptors such as Nxf1, Crm1 or other karyopherins to be actively transported through the NPC. vi) To initiate cytoplasmic translation, the 43S
ribosomal preinitiation complex is recruited to the 5′ end of the mRNA, a process that is co-ordinated by eIF4E through its interactions with the m7G cap, eIF4G and
the 40S ribosomal subunit associated eIF3 (Hernández et al., 2012). Several unique eukaryotic initiation factors (eIF4E, eIF4G, eIF4B, eIF4H and eIF3) are involved
with 5′ -cap-binding and scanning processes that are essential to the initiation and translation of capped eukaryotic mRNA (Jagus et al., 2012). vii) Once the ribosome
has been recruited to the capped mRNA transcript, a scanning process occurs, and translation is generally initiated at the first ATG encountered.

microtubule-dependent motors during development and differentiation Although the origin of the NCLDV group may lie amongst viruses
(Starr, 2009). Although unrelated to the eukaryotic nucleus, 201 Φ2-1 is such as adenoviruses/PRD1 (Woo et al., 2019), possibly amongst
a critical discovery since it shows that some prokaryotic viruses have Polintonviruses (Koonin et al., 2015), the last common ancestor of the
evolved a nucleus-like uncoupling of transcription from translation, and NCLDV group appears to have possessed a complex genome consisting of
thus the uncoupling of transcription from translation potentially pre­ at least 30–50 genes including the apparatus required to produce m7G
dates the origin of the eukaryotic nucleus by over a billion years. capped mRNA (Iyer et al., 2001; Yutin and Koonin, 2012). In addition to
A relationship between the eukaryotic nucleus and viruses was the mRNA capping apparatus, fundamental information processing
simultaneously proposed almost 20 years ago based on analysis of DNA genes such as RNA polymerase, DNA polymerase, PCNA, DNA ligase,
polymerases (Takemura, 2001) and shared fundamental features (Bell, topoisomerase II and ribonucleotide reductase were also present (Iyer
2001). The 2001 version of the Viral Eukaryogenesis hypothesis made et al., 2001). Distantly related NCLDV viruses from the Poxviridae,
the radical proposal that the eukaryotic nucleus and the Pox viruses Asfaviridae, Pithoviridae Marseilleviridae and Mimiviridae families all
were descendants of an ancient archaeal virus that evolved mRNA replicate partially or exclusively within large cytoplasmic viral factories
capping to direct translation of the viral mRNA by the host ribosomes (Fridmann-Sirkis 2016) suggesting the common ancestor of the NCLDV
(Bell, 2001). The hypothesis was based on the observation that Pox vi­ group could also establish a viral factory in the cytoplasm of its ancient
ruses could produce capped mRNA, possessed linear chromosomes, host, and produce m7G capped mRNA.
could separate transcription from translation, and had an ability to As well as inheriting the ancestral ability to produce capped mRNA
replicate entirely within the host cytoplasm (Bell, 2001). Subsequently from the common ancestor of the NCLDV group, two separate clades of
Pox viruses were found to be members of an ancient monophyletic modern NCLDV viruses, the Pandoraviridae and the Mimiviridae, possess
group, the NCLDV viruses (Iyer et al., 2001) and the common ancestor of homologues of the unique eukaryotic cap binding protein eIF4E (Schulz
the NCLDV group possessed the genes required to produce capped et al., 2017). Unlike many NCLDV viruses, the Pandoraviruses possess
mRNA (Iyer et al., 2001; Yutin and Koonin, 2012). The discovery of the introns in their genes strongly suggesting that at least part of the Pan­
giant Mimivirus in 2004 and its allocation to the NCLDV group (Raoult doravirus genome is transcribed in the nucleus (Philippe et al., 2013). By
et al., 2004) demonstrated pox-virus relatives were of unprecedented contrast members of the Mimiviridae have been shown to establish viral
size and possessed a complexity comparable to prokaryotic cells (Raoult factories in their host’s cytoplasm which, by excluding ribosomes, un­
et al., 2004). Subsequently, many other NCLDV viruses have been couple transcription from translation (Fridmann-Sirkis et al., 2016).
discovered including even more complex relatives such as the Klose­ Additionally, the cap binding protein expressed by the Mimivirus is
neuvirus (Schulz et al., 2017) and Tupanvirus (Abrahão et al., 2018). located in the cytoplasm during infection (Fridmann-Sirkis et al., 2016).

3
P.J.L. Bell Virus Research 289 (2020) 198168

Thus, the Mimiviridae are a suitable group for phylogenetically testing As shown in Table 1, no archaea were found that possess the suite of
the VE hypothesis since they are a well characterised NCLDV group that three genes required to produce and recognise m7G capped mRNA.
establishes a solely cytoplasmic viral factory and possesses both the Some homologues with significant e values (<1 × 10− 5) were detected
apparatus to transcribe m7G capped mRNA and the eIF4E gene that amongst the archaea, but the distribution is highly sporadic, and no
allows initiation of the capped mRNA in the eukaryotic cytoplasm. single genome possessed all three genes. Significantly the archaeal ho­
In this paper, the original VE hypothesis (Bell, 2001) is updated and mologues identified were found in metagenomic sequences where it is
modified to account for the many significant discoveries that have known that gaps, local assembly errors, chimeras, and contamination by
occurred in the 20 years since it was first published. Firstly, rather than fragments from other genomes can limit the value of the genomes (Chen
proposing descent from the virion of a pox-like virus (Bell, 2001) the VE et al., 2020). When the metagenomic genes with significant homology
hypothesis is modified to propose that the eukaryotic nucleus descends were used in Blast searches of their own, they were all found to be either
from viral factory of a giant NCLDV-like archaeal virus. It is proposed closely related to eukaryotic or NCLDV viral sequences or found in other
that this ancestral virus gave rise to both the modern NCLDV viruses and unrelated metagenomic sequences (Table 1) suggesting they were ar­
the eukaryotic nucleus and evolved the m7G cap as an integral part of tifacts of the metagenomic assemblies. The results obtained here thus
the process to export mRNA from the viral factory and effectively engage provide no evidence for an uncoupled transcription and translation
the host archaeal ribosomes. Since NCLDV viruses such as the Mimivir­ system in the archaeal domain.
idae construct a viral factory in the host cytoplasm, uncouple tran­ To determine whether the closest proposed archaeal relatives of the
scription from translation, synthesise m7G capped mRNA, and encode eukaryotes possessed homologues of the eukaryotic cap based system for
the eIF4E cap binding protein, it is proposed here that NCLDV viruses uncoupling of transcription from translation, the genome of Heimdal­
and the eukaryotic nucleus both obtained these abilities from a viral larchaea LC-3 (formerly Loki3, (Spang et al., 2015, 2018)) was exam­
‘First Eukaryotic Nuclear Ancestor’ (FENA). Finally, rather than the ined. To evaluate whether Heimdallarchaea LC-3 possessed a bona-fide
eukaryotic cytoplasm descending from a syntrophic ‘methanogenic archaeal transcriptional system with one RNA polymerase or a more
mycoplasma’, the updated VE hypothesis proposes that the eukaryotic eukaryote-like system with three RNA specialist polymerases, RPO21 of
cytoplasm descends from a syntrophic archaeon belonging to the Asgard S. cerevisiae was used to identify homologues of the largest subunit of
group, a group which many now consider to be the closest known RNAP in Heimdallarchaea LC-3. The largest subunit was chosen because
archaeal relatives of the eukaryotes (e.g. Eme et al., 2017). it is the only eukaryotic RNAP subunit that encodes the CTD hepta­
The VE hypothesis is testable since it proposes the eukaryotic nucleus peptide repeat (YSPTSPS) required for recruiting the capping apparatus
and the Mimiviruses did not obtain the ability to cap mRNA and initiate to the nascent mRNA transcript (McCracken et al., 1997). The Homo
translation of capped mRNA via the eIF4E gene from the archaeal sapiens genome was searched for homologues to illustrate the level of
domain, but rather both eukaryotic nuclei and Mimiviruses obtained homology of these genes in distantly related descendants of LECA.
these abilities from an ancestral archaeal virus which used this mecha­ Despite the large evolutionary distance between the yeast and humans,
nism to direct the host’s archaeal translational apparatus to selectively three different homologues of RPO21 with very significant E values were
translate viral mRNA. To test the hypothesis, phylogenetic analysis was identified in H. sapiens (Table 2). These three correspond to RNAP-I,
performed on three sets of genes essential to the uncoupling of tran­ RNAP-II and RNAP-III which are present in all eukaryotes (Sentenac,
scription from translation by the eukaryotic nucleus: firstly, the largest 1985). By contrast only a single RNA polymerase subunit A’ was iden­
subunit of RNAP-II was chosen for analysis because it directly interacts tified as a homologue in Heimdallarchaoeta LC-3 indicating it possesses a
with the capping apparatus (McCracken et al., 1997) and specifically bona-fide prokaryotic transcription system where all RNA is transcribed
transcribes eukaryotic mRNA destined for capping and export into the by the same RNA polymerase (Werner, 2007).
cytoplasm; secondly, the genes of the capping apparatus itself were Using the CEG1 (GTase) of S. cerevisiae to search for GTase homo­
chosen because they are required to add the m7G cap to eukaryotic logues in H. sapiens identifies the human GTase with a very significant E
mRNA, and thirdly, the eIF4E gene was chosen because it is required to value. By contrast, although some putative homologues with non-
initiate translation of capped mRNA in the cytoplasm (see Fig. 2). Since significant E values were identified with Blast in Heimdallarchaeota LC-
these genes interact with each other to prime the uncoupling of tran­ 3, only one of these shared homology with the known domain structure
scription from translation via the m7G cap (Fig. 2), they represent a of the GTase. This gene was an ATP Ligase (Table 2), a group known to
co-evolving module of genes and thus allow testing of the hypothesis share homology with the GTases (Shuman and Schwer, 1995). Using the
that the eukaryotic nucleus and its ability to uncouple transcription from ABD1 (MTase) gene of S. cerevisiae identifies homologs with significant E
translation is viral in origin. values in both humans and Heimdallarchaeota LC-3. However it is known
that the methyltransferase domain of the capping enzyme shares ho­
2. Results mology with a wide family of methyltransferases, and according to the
annotated genome of the Heimdallarchaeota LC3, the gene detected in
2.1. Archaea, including the proposed archaeal relative of the eukaryotes, this search shares affinity with the Trans-aconitate 2- methytransferases
show no evidence of a functional pathway required to produce and rather than the capping MTase. Using the S. cerevisiae eIF4E gene
translate m7G capped mRNA (CDC33) to search for homologues of eIF4E in H. sapiens identifies the
human eIF4E with a very significant E value (Table 2). By contrast, no
According to both the three domain and two domain schools of homologues of eIF4E with significant E values were found in the
thought, the closest extant relatives of the eukaryotes are amongst the Heimdallarchaeaota LC-3 genome. Furthermore, none of the genes with
archaea rather than the bacteria. It is thus informative to establish the even low degrees of homology detected in Heimdallarchaeota LC-3
extent to which the machinery for synthesising and translating m7G possessed the conserved sites that are known to be involved when eIF4E
capped mRNA is present within domain archaea. To add an m7G cap to binds the m7G cap (Marcotrigiano et al., 1997).
mRNA, eukaryotes require TPase, a GTase and an MTase genes. Since These results are consistent with Heimdallarchaeota LC-3 possessing a
the TPase gene in eukaryotes apparently arose from two phylogeneti­ bona-fide archaeal transcription and translation apparatus. The
cally different origins (Ramanathan et al., 2016; Kyrieleis et al., 2014), culturing of a member of the Asgard archaea has confirmed Asgard
only the GTase and MTase were used to search the archaeal domain for archaea are bona-fide prokaryotes that lack both nuclei and mitochon­
homologues. Once m7G capped mRNA is in the eukaryotic cytoplasm, dria (Imachi et al., 2020). The absence of any sign of a mitochondrion or
the first step in the initiation of translation is the binding eIF4E to the nucleus in Asgard archaea and the appearance of both in LECA must
m7G cap. The CEG1 (GTase), ABD1 (MTase) and CDC33 (eIF4E) of therefore be explained to understand the origin of the eukaryotic
S. cerevisiae were used to initiate Blast searches of the archaeal domain. domain.

4
P.J.L. Bell Virus Research 289 (2020) 198168

Table 1
Homologues of S. cerevisiae GTase, MTase and eIF4E in Domain Archaea.
Gene S. cerevisiae Top archaea E value Source Accession Re-blast (top score) E value Accession Source

Gtase CEG1 Euryarchaeota 1e -13 Metagenome MBP04687 Bacterium 0.0 MAB61008 Metagenome
Ostreococcus virus 0.0 YP_007674738 NCLDV virus
Mtase ABD1 Euryarchaeota 3e -32 Metagenome MBI19999 Ostreococcus virus 8e -155 YP_009172576 NCLDV virus
Euryarchaeota 3e -10 Metagenome MBJ62846 Ostreococcus virus 3e -94 QIZ231141 NCLDV virus
Euryarchaeota 4e -08 Metagenome RZD41681 Phage 0.0 ANS04335 Metagenome
Bacterium 0.0 MBN20740 Metagenome
Phaeocystis virus 0.0 YP_008052553 NCLDV virus
Methanoregula 1e -06 Metagenome OPX64967 Bacterium 6e -90 OPY16766 Metagenome
eIF4E CDC33 Euryarchaeota 3e -32 Metagenome MBI19999 Ostreococcus taurii 2e -107 XP_022840751 Eukaryote
Archaeon 2e -12 Metagenome RYG63438 Phytophthora 9.0E-62 XP_08898749 Eukaryote
Archaeon 6e -06 Metagenome RYG64788 Ectocarpus 4e -29 CBN73980 Eukaryote

2.2. Mimiviral and eukaryotic RNAP-II, Gtase, MTase and eIF4E form suggesting that the four genes have a common phylogenetic signal.
two discrete monophyletic groups Within the eukaryotic domain, clades corresponding to Holozoa, Ameo­
bozoa, Fungi, Viridiplantae, Alveolata and Excavata were well resolved
Unlike any known prokaryotes, both phiKZ-like bacteriophage and with high support. These results are consistent with studies that show
members of the Mimiviridae separate transcription from translation by LECA possessed a functional eukaryotic nucleus (Neumann et al., 2010)
constructing viral factories in the cytoplasm of their hosts (Fig. 3). and that all four eukaryotic genes identified as critical in uncoupling
Members of the Mimiviridae additionally possess a functional mRNA transcription from translation using the m7G cap descend from a com­
capping pathway that was inherited from the common ancient ancestor mon ancestral set of genes that were present in LECA. The Mimiviridae
of the NCLDV group of viruses (Iyer et al., 2001), a group that predates also belong to well supported monophyletic group suggesting that all
LECA (Guglielmini et al., 2019). It is homologous to the pathway utilised four genes were also present in the common viral ancestor of the Mim­
by the eukaryotes to prime the uncoupling of transcription from trans­ iviridae. The Mimiviridae resolved into three clades that correspond to
lation and includes a single RNAP dedicated to viral mRNA synthesis and those previously described in the Mimiviridae (Claverie and Abergel,
a gene encoding the TPase, GTase and MTase activities required to add 2018).
an m7G cap to the mRNA. In addition to the apparatus required to
produce m7G capped mRNA, all the members of the Mimiviridae 2.3. The eukaryotic RNAP-II dedicated to capping mRNA shares a
examined also encoded eIF4E, the cytoplasmic cap binding protein common ancestor with the Mimiviridae RNAP, and the common ancestor
required to initiate translation of the capped mRNA in the cytoplasm. predates the origin of LECA
In this phylogenetic analysis members of the eukaryotic domains
were carefully selected to cover the major eukaryotic supergroups and Establishing the root of the MTase, GTase and eIF4E phylogenetic
thus span the diversity of the eukaryotic domain (see Materials and trees is challenging since only paralogues of these three genes exist
Methods). Eukaryotic clades were chosen that contained at least one outside the eukaryotes and the NCLDV viruses. Additionally, these genes
member that has been extensively studied at a molecular level and are short and possess relatively little phylogenetic information. By
where experimental knowledge of the processes of transcription and contrast, the RNAP gene is a large phylogenetically informative gene
translation exists (see Materials and Methods). In addition, all phylo­ that is found in all cellular domains. Since independent fossil evidence
genetic analysis uses the same organisms, and only species with ge­ demonstrates that domain Archaea existed some two billion years before
nomes where all genes (RNAP-II, GTase, MTase and eIF4E) could be the appearance of LECA (Knoll, 2015), the archaeal RNAP large subunit
unambiguously identified were used in tree construction. is a suitable outgroup for polarising the relationship between the
Although all the phylogenetic trees have been drawn with a root eukaryotic RNAP-II and Mimiviral RNA polymerases. An additional
between the viral and eukaryotic versions of the genes, establishing the advantage of the RNAP based tree is that all eukaryotes possess multiple
root of the MTase, GTase and eIF4E phylogenetic trees is problematic RNAP’s (Sentenac, 1985). Since these multiple RNAP’s were present in
since the capping apparatus is unique to the eukaryotic domain and the LECA, these can be used in concert with the archaeal sequences to firmly
NCLDV viruses. However, as shown Fig. 4a the unrooted phylogenetic establish the root of the RNAP tree. Since both logic and the phyloge­
tree of the RNAP largest subunit resolves into two discrete monophyletic netic analysis performed here show that the RNAP, GTase, MTase and
clades: the eukaryotes which descend from LECA, and the Mimiviridae eIF4E genes are part of a co-evolving module responsible for producing
that descend from the common ancestor of the Mimiviridae. Despite the and translating capped mRNA, it can be argued that establishing the root
more limited phylogenetic information contained in the GTase, MTase of the RNAP tree can be used to deduce the phylogeny of the entire
and eIF4E alignments, similar patterns are observed to the RNAP tree, capping apparatus.
and the monophyly of the eukaryotes and the Mimiviridae is maintained As shown in Fig. 5, using the archaeal RNAP subunit A’ and the
in each case. Concatenating the four genes (Fig. 4e) generates a phylo­ homologous region of the eukaryotic RNAP-III to polarise the relation­
genetic tree with bootstrap values higher than any of the individual trees ship between RNAP-II and the Mimiviridae RNAP shows that both the

Table 2
Homologues of S. cerevisiae RNAP-II, GTase, MTase and eIF4E in Homo sapiens and Heimdallarchaota LC-3 identified using Blast.
Homo sapiens Heimdallarchaeota LC-3
Saccharomyces gene name Annotation in Homo sapiens Annotation in Heimdallarchaeota LC-3
Accession E value Accession E value

RNA polymerase I large subunit NP_056240 1.00E-97


RPO21 (NP_010141) RNA polymerase II large subunit NP_000928 0 OLS19521.1 0 RNA polymerase subunit A’
RNA polymerase III large subunit NP_008986 0
CEG1 (NP_011385) Guanylytransferase AAH19954 2.00E-17 OLS26805.1 2.7 DNA ligase
ABD1 (NP_009795) Methytransferase NP_003790 2.00E-38 OLS26405.1 4.00E-05 Trans-aconitate 2-methyltransferase
CDC33 (NP_014502) eIF4E NP_001959 3.00E-36 OLS27732.1 0.37 5-exo-hydroxycamphor dehydrogenase

5
P.J.L. Bell Virus Research 289 (2020) 198168

Fig. 3. 201 Φ2-1 viral factories, Mimivirus viral factories, and the eukaryotic nucleus share the ability to uncouple transcription from translation. i a)
Image of Phage 201 Φ2-1 viral factory (Chaikeeratisak et al., 2017b). ii a) Image of Mimivirus viral factory (Zauberman et al., 2008). iii a) A eukaryotic nucleus. i b)
Phage 201 Φ2-1 establishes a viral factory in the cytoplasm of the bacterial host confining DNA replication and transcription to the viral factory. Translation is
confined to the cytoplasm since host bacterial ribosomes are excluded from the viral factory (Chaikeeratisak et al., 2017b). Since PhiKZ relatives of 201 Φ2-1 can
complete infection in the absence of bacterial RNA polymerase (RNAP) activity (Ceyssens et al., 2014) it can be inferred that the multi-subunit RNAP genes encoded
by the virus are transcribed in the viral factory, transcripts exported into the cytoplasm for translation and the proteins re-imported into the viral factory to transcribe
the viral DNA. ii b) The Mimivirus establishes a viral factory in the cytoplasm of its eukaryotic host (Mutsafi et al., 2010) confining DNA replication and transcription
to the viral factory. Translation is confined to the cytoplasm since host ribosomes are excluded from the viral factory (Fridmann-Sirkis et al., 2016). Mimiviruses
encode a multi-subunit RNA polymerase that transcribes Mimiviral DNA and functions within the viral factory (Fridmann-Sirkis et al., 2016). It can therefore be
inferred that the Mimivirus viral factory controls which macromolecules are transported in and out of the viroplasm. Like the eukaryotic nucleus Mimiviridae encode
their own mRNA capping apparatus and a version of the eIF4E gene. In cells infected by the Mimiviridae, EIF4E remains located in the host cytoplasm (Frid­
mann-Sirkis et al., 2016). iii b) The eukaryotic nucleus is a specialised compartment located in the cytoplasm that confines DNA replication and transcription within
its boundaries. Translation is confined to the cytoplasm since functional ribosomes are excluded from the nucleus. The mRNA encoding RNAP- II subunits are
transcribed within the nucleus, exported into the cytoplasm for translation into protein, and the protein is imported into the nucleus to transcribe nuclear DNA.
Unlike viral factories, the mechanisms by which the nucleus sorts the macromolecules that can enter and exit the nucleus is well understood and known to be
controlled by the NPCs. Eukaryotic nuclei encode their own capping apparatus and encode the eIF4E gene which binds to the m7G cap to initiate translation in
the cytoplasm.

eukaryotic and Mimiviral genes descend from a common ancestral gene concert (Fig. 2) and the m7G cap is integral to this pathway since it is
that predated the origin of LECA. The high bootstrap values give con­ used to prime mRNA for processing, nuclear export, and cytoplasmic
fidence that there is significant phylogenetic information in the align­ translation (Fig. 2). The absence of nuclei and the m7G apparatus in
ment. In addition, subtrees of RNAP-II and RNAP-III both recapitulate archaea implies that the highly complex pathway for uncoupling tran­
the expected phylogenetic relationships between the eukaryotes, scription from translation is also absent from archaeal relatives of the
including support for the Excavata being the most divergent eukaryotic eukaryotes. Under existing evolutionary paradigms this presents a major
supergroup (Hampl et al., 2009). Furthermore, within the eukaryotic biological paradox since such a complex pathway incorporating the
domains, all the chosen eukaryotes were assigned to their accepted concerted action of hundreds of genes unique to the eukaryotic domain
branches. A parsimonious explanation of the observed tree is that the implies a long evolutionary history, yet no sign of the pathway is found
ability to produce m7G capped mRNA was a feature of the ancestor of in what are proposed as the closest cellular relatives of the eukaryotes.
both the eukaryotic RNAP- II and Mimiviridae RNA polymerase since Prior to the discovery of the nucleus-like viral factory of 201 Φ2-1,
both the eukaryotic and viral genes produce capped mRNA, whilst the ability to uncouple transcription from translation was thought to be
neither RNAP-III nor the Archaeal RNAP is associated with producing an exclusive innovation of the eukaryotes. Thus, arguments could be
capped mRNA. Although other interpretations may be possible, the tree made that the viral factory of the Mimiviruses had evolved by borrowing
is entirely consistent with descent of the eukaryotic nucleus and the genes from the nucleus to allow it to establish the eukaryote-like
Mimiviridae from a giant virus that could build a viral factory in the uncoupling of transcription from translation. However, since jumbo­
cytoplasm of its archaeal host and export m7G capped mRNA into the phage 201 Φ2-1 infects bacteria it seems very unlikely that it obtained its
cytoplasm to direct host ribosomes to selectively translate viral mRNA. ability to build a viral factory and uncouple transcription from trans­
lation from the eukaryotes, but rather indicates this ability has evolved
3. Discussion in prokaryotic viruses as part of their replication cycle. Studies on a
relative of 201 Φ2-1 (PhiKZ) show the viral factories shield viral DNA
Here it is shown that the apparatus used by eukaryotes to produce from host immune systems including the CRISPR-Cas system (Mendoza
and initiate translation of m7G capped mRNA is not found in archaeal et al., 2020). Thus viral factories appear to have evolved as an evolu­
relatives of eukaryotes. This is significant since in the eukaryotic nu­ tionary novelty during the multi-billion year struggle between viral and
cleus, the uncoupling of transcription from translation requires a com­ prokaryotic immune systems (Forterre and Prangishvili, 2009) and
plex highly evolved pathway consisting of hundreds of genes acting in developed under selective pressure to provide biological protection from

6
P.J.L. Bell Virus Research 289 (2020) 198168

Fig. 4. Unrooted phylogenetic trees of the mRNA capping pathway in selected eukaryotes and Mimiviridae. All five trees use sequences from the same set of
carefully selected organisms (see Materials and Methods) and the proposed position of LECA is marked in each tree. The number of conserved amino acids in the
final alignment for each gene is marked on the diagram. Trees were constructed and drawn using the ML method using default settings in MEGA7 with 1000
bootstrap replicates. NCBI accession numbers are given for each sequence in the Materials and Methods. Mimiviridae informal grouping names are based on Claverie
and Abergel, 2018. a) RNAP largest subunit gene tree. b) GTase gene tree. c) MTase gene tree. d) eIF4E gene tree. e) Phylogenetic tree inferred from concatenation of
all four gene sequences.

7
P.J.L. Bell Virus Research 289 (2020) 198168

Fig. 5. Maximum Likelihood tree of RNA polymerases


using Archaeal RNAP subunit A’ as an out-group. The
RNAP A’ subunit of archaea was used as an out-group to
establish the root of the largest subunit of the Mimiviral RNAP
and Eukaryotic RNAP-II and RNAP-III genes. RNAP-II and
RNAP-III are found to belong to two separate monophyletic
groups. Both the RNAP-II and RNAP-III trees are robust,
appropriately assign eukaryotes to their correct phylogenetic
branches and re-capitulate the expected phylogenetic re­
lationships between the eukaryotes and indicate an early
divergence of the Excavata (Hampl et al., 2009). The Mimivir­
idae tree is consistent with previous phylogenetic analyses of
the Mimiviridae (Claverie and Abergel, 2018). This tree shows
that the Mimiviridae and eukaryotic RNAP-II genes share a
common ancestor. This ancestor existed before LECA and is
consistent with the proposal that both descend from FENA, a
proposed viral ancestor of both the Mimiviridae and the
eukaryotic nucleus that infected an archaeal ancestor of the
eukaryotes. Since both viral and eukaryotic RNAP-II synthesise
m7G capped mRNA it can be inferred that the common RNA
polymerase ancestor also produced capped mRNA and existed
before the origin of LECA. This tree was produced from an
alignment of 64 sequences and 598 positions using Maximum
Likelihood method and the JTT substitution model. Bootstrap
values are indicated on each branch and are based on 1000
replicates. The tree and the computations were performed
using MEGA7. NCBI accession numbers are given for each
sequence in the Materials and Methods.

various anti-virus systems possessed by prokaryotic hosts (Hendrickson abundance of prokaryotic hosts including members of the Asgard
and Poole, 2018). The discovery of the viral factories of viruses such as archaea were present, and Loki’s Castle sediment samples contained
201 Φ2-1 demonstrates that uncoupling of transcription from translation virtually no 18S rRNA gene sequences of eukaryotic origin implying an
is a feature of giant viruses that infect prokaryotes, and since pro­ absence of eukaryotic hosts for the NCLDV viruses (Bäckström et al.,
karyotes existed billions of years before the origin of the eukaryotes, the 2019). Although the authors did not entertain the possibility that the
ability to uncouple transcription from translation potentially has a very NCLDV viruses identified in the metagenomic analysis infected the
long evolutionary history, evolving billions of years before the origin of archaea present in the sediments, a core proposition of the VE hypoth­
LECA. These discoveries have led others to support the hypothesis that esis is that both the eukaryotic nucleus and the modern NCLDV viruses
the nucleus is derived from a viral factory (e.g. Forterre and Raoult, descend from an NCLDV-like ancestor that infected the archaea, built
2017). viral factories, and used mRNA capping to direct translation of viral
It is shown in this study that although the apparatus for producing mRNA in their archaeal hosts.
capped mRNA appears absent from the archaeal relatives of the eu­ In the absence of their own translational machinery, all mRNAs
karyotes, the apparatus is present in the Mimiviridae which is consistent produced by viruses must engage cellular ribosomes to ensure trans­
with the postulates of the VE hypothesis. Phylogenetic analysis of the lation (Jan et al., 2016). The modern nucleus is thus clearly differ­
capping apparatus demonstrates that viral and eukaryotic genes form entiated from any member of the Mimiviridae by its complete
discrete monophyletic clades, and that both viral and eukaryotic clades autonomy and its ability to construct a fully functional translational
obtained these genes from a common ancestor that existed prior to the system including ribosomes. It is therefore noteworthy that recent
appearance of LECA. This is consistent with other studies that have studies have shown that some giant bacterial viruses have acquired
shown that the common ancestor of the NCDLV viruses was an ancient significant parts of their host’s translational apparatus, allowing the
virus that possessed an mRNA capping apparatus (Iyer et al., 2001; Yutin viruses to intercept and redirect the host translational systems to­
and Koonin, 2012) and predated the origin of LECA (Guglielmini et al., wards selectively translating viral mRNA (Al-Shayeb et al., 2020). For
2019). This pattern is also consistent with the proposal presented here example, the BJP phage 29_15 virus (accession number ERS4026237)
that the eukaryotic nucleus and the Mimiviridae both descend from which is predicted to infect members of the Bacteroides is 634 780 bp
FENA, a giant virus that infected the archaeal ancestor of the eukaryotes. in length and encodes 63 tRNA’s, 15 tRNA synthetases, and bacterial
Currently, relatively little is known about archaeal viruses in general translation factors such as EF-G, RF-1, RP-S1 and Sigma 70 (Al-Shayeb
and giant viruses infecting the Asgard archaea are yet to be described. et al., 2020). Like these giant bacterial viruses, members of the
Given the very recent discovery of the Lokiarchaeota (Spang et al., 2015) NCLDV group such as the original Mimivirus possess numerous genes
and the even more recent culturing of the first member of the Asgard encoding central protein-translation components (Raoult et al., 2004).
archaea (Imachi et al., 2020), it is not surprising that no viruses (giant or Other more recently discovered members such as the Tupanvirus
otherwise) that infect Asgard archaea have been described or studied in possess a translation associated gene set that ‘only lacks the ribosome’
detail. Perplexingly, examining the DNA samples from which the and includes up to 70 tRNA, 20 tRNA synthetases, and 11 factors for
Lokiarchaeota genome sample was assembled (Spang et al., 2015) all translation steps and factors related to tRNA/mRNA maturation
revealed the presence of divergent members of the NCLDV group and no and ribosome protein modification (Abrahão et al., 2018). Since it
viruses were described that infect the Archaea present in the sediments appears that the ancestor of the Mimiviridae did not possess all these
(Bäckström et al., 2019). This was particularly surprising since an functions, the appearance of so many translation related genes in

8
P.J.L. Bell Virus Research 289 (2020) 198168

viruses such as the Kloseneuvirus and the Tupanvirus suggests that they 2013), a problem described as the queen of evolutionary problems (Bell,
acquired these components of the eukaryotic translational machinery 1982). Thus, the origin of the nucleus from a viral factory addresses
via a piecemeal capture process (Schulz et al., 2017). In addition to many of the paradoxes associated with the appearance of a fully formed
acquiring genes of the translational apparatus, many NCLDV genomes and functional nucleus in LECA, despite its apparent absence from the
encode genes involved in central carbon metabolism, including most archaeal domain including members of the Asgard archaea.
of the enzymes for glycolysis, gluconeogenesis, the TCA cycle, and the It should be noted that the VE hypothesis is not a pure ‘endosymbi­
glyoxylate shunt (Moniruzzaman et al., 2020). This indicates that otic theory’. According to the VE hypothesis, the eukaryotic cell is
selective pressure to increase autonomy amongst the NCLDV group descended from an archaeal ancestor of the eukaryotic cytoplasm, a
can extend to obtaining genes that allow the fundamental reprog­ bacterial ancestor of the mitochondrion, and as explored in this paper, a
raming of the host metabolism through manipulation of intracellular viral ancestor of the nucleus. Although the archaeal ancestor of the
carbon fluxes (Moniruzzaman et al., 2020). Together, these results cytoplasm may have had a mutually beneficial symbiotic relationship
indicate that NCLDV viruses can acquire significant number of both with a bacterium leading to the origin of the mitochondria, the host
translational and metabolic genes as they progress along a trajectory archaeon did not gain any benefit from the viral infection, rather the
of increasing autonomy. It should also be noted that phylogenetic archaeon host was enslaved by the virus and its genome was ultimately
analysis of Last Eukaryotic Mitochondrial Ancestor indicates that it destroyed.
possessed only 69 structural genes (Roger et al., 2017). Thus, the However, like the endosymbiotic theories for the origin of eukaryotic
bacterial ancestor of the mitochondrion had been transferring genes mitochondria and chloroplasts, the VE hypothesis deals with complex
to the nucleus for a substantial evolutionary period before the origin irreversible events that are difficult to directly prove (Margulis, 1975)
of LECA. If the bacterial genome of the mitochondrion ancestor could and will rather rely on the accumulation of multiple lines of convergent
be reduced by ~ 30 fold prior to the origin of LECA, it seems plausible evidence (McInerney et al., 2014). In the case of the mitochondria, it
to argue that over the same evolutionary period an already giant virus took nearly 100 years before the consilience of evidence built up suffi­
with a propensity to acquire translational and metabolic genes for ciently for the endosymbiotic origin the mitochondria to become
increased autonomy could evolve into the completely autonomous (almost) universally accepted (McInerney et al., 2014). Although a more
eukaryotic nucleus possessing both a complete translational system radical concept than endosymbiosis, if the VE hypothesis is similarly
and a basic set of metabolic pathways. supported by the accumulation of multiple lines of evidence, it will
If the VE hypothesis is valid and a process to capture the translational introduce a major paradigm shift in our understanding of the evolution
apparatus was operating in the viral ancestor of the nucleus, the trans­ of complex life on earth. If the VE hypothesis is valid, the eukaryotic cell
lational apparatus acquired by the virus could only have been captured derives from a consortium of three organisms that became integrated to
from cells that existed before LECA evolved. Since the ancestor of the such an extent that they created an emergent ‘super-organism’. The
nucleus is proposed to have infected an archaeal host, the acquired novel structural and genetic features of this emergent ‘super-organism’
translation related genes would be derived from the archaeal domain allowed it to escape the limitations of prokaryotic evolution and evolve
and would thus be directed to assuming control of the host’s archaeal to the unprecedented levels of organismal complexity observed in the
translational system. Consistent with this proposal, there is a close eukaryotic domain.
relationship between archaeal and eukaryotic ribosomes (Woese et al.,
1990) and it is known that eukaryotic nuclei possess a core set of 4. Materials and methods
archaeal related translation initiation factors including eIF1A, eIF2,
eIF2B, eIF4A, eIF5B and eIF6 (Jagus et al., 2012). In addition to the 4.1. Choice of eukaryotic organisms
archaeal related translation initiation factors, there is a core set of
eukaryotic specific initiation factors (eIF5, eIF4E, eIF4G, eIF4B, eIF4H 4.1.1. Eukaryotes
and eIF3) (Jagus et al., 2012). With the exception of eIF5, all the The organisms used in this study were carefully selected to cover all the
eukaryotic specific initiation factors are involved with 5′ -cap-binding relevant groupings of eukaryotes, whilst limiting the complexity of the
and scanning processes required for initiating translation of capped phylogenetic analysis. Currently 5 or 6 eukaryotic supergroups are pro­
eukaryotic mRNA (Jagus et al., 2012). This supports the proposal that posed to cover the vast majority of eukaryotic diversity (Hampl et al.,
the m7G cap evolved as a strategy for directing translation of viral 2009). The present study focussed on ‘model’ organisms for the phyloge­
transcripts by the host archaeal ribosomes and was superimposed on the netic trees so that there was significant knowledge of their molecular
essentially archaeal translational system acquired by the viral ancestor biology of at least one or more of the divisions. To represent the Holozoa,
of the nucleus. Homo sapiens, Mus musculus, Danio rerio and Caenorhabditis elegans were
The descent of the nucleus from a viral factory provides a plausible chosen since each is a model organism, and the phylogenetic relationships
resolution to several of the major paradoxes associated with the origin of are well established. To represent Amoebozoa, Dictyostelium disocoidium
the nucleus. That is, if the nucleus descends from a viral factory of a was chosen since it is a model organism. Dictyostellium purpurem and Acy­
giant archaeal virus and the viral factory was similar in structure and tostelium subglosum were chosen as suitably distant relatives. To represent
function to the 201 Φ2− 1 and Mimiviridae viral factories (Fig. 3), the VE the Fungi, Saccharomyces cerevisiae, Kluyveromyces marxianus and Asper­
hypothesis explains why the nucleus is mainly an information contain­ gillus niger were chosen since all three are model organisms and the
ing and processing compartment, why it selectively controls the entry phylogenetic relationships are well understood. To represent Viridiplantae,
and exit of proteins and nucleic acids, why it exports mRNA into the Arabidopsis lyrata was chosen as a model species and Brassica napus was
cytoplasm, why it contains no functional ribosomes, why it possesses chosen as a relatively close relative. Ostreococcus tauri was chosen as a
linear rather than circular chromosomes, why it is positioned in the cell distant algal relative of the land plants. To represent the SAR group, focus
by the tubulin cytoskeleton, and as explored in this paper, why the eu­ was placed on the Alveolata group since members such as Plasmodium and
karyotes possess highly evolved complex machinery to allow uncoupling Cryptosporidium have been studied in depth at a molecular level. To ensure
of transcription from translation with no prokaryotic precedents. It also the robustness of the tree and limit the effects of long-branch attraction,
provides a rationale for the neo-functionalisation of RNA polymerases in Plasmodium falciparum and Plasmodium vivax were chosen as close relatives
the eukaryotes since the viral factory introduces its own RNA poly­ whilst Theileria equi strain WA, Cryptosporidium muris, and Perkinsus mar­
merase specifically dedicated to the transcription of capped viral mRNA inus were chosen as increasingly distantly related members of the Alveolata.
destined for translation in the cytoplasm. The origin of the nucleus from To represent the Excavata, members of the Trypanosoma were chosen since
a viral ancestry has also been shown to provide a plausible mechanistic they are model organisms that have been studied in depth at molecular
model for the origin of mitosis, meiosis and the sexual cycle (Bell, 2006, level. To ensure robustness of the tree and to minimise the effects of

9
P.J.L. Bell Virus Research 289 (2020) 198168

Table 3
Summary of Accession numbers used in this study.
Group Species Gtase Mtase eIF4E RNAP-II RNAP-III

Eukarya Fungi S. cerevisiae NP_011385 NP_009795 NP_014502 NP_010141 NP_014759


Eukarya Fungi K. marxianus XP_022676394 XP_022674569 XP_022678436 XP_022677581 XP_022678447
Eukarya Fungi A. niger XP_001400555 XP_001394253 XP_001395221 XP_001389676 XP_001393726
Eukarya Holozoa H. sapiens AAH19954 BAA82447 NP_001959 NP_000928 NP_008986
Eukarya Holozoa M. musculus NP_036014 NP_080716 NP_031943 AAB58418 NP_001074716
Eukarya Holozoa D. rerio NP_998032 NP_001038465 NP_001007778 XP_005156282 NP_001263425
Eukarya Holozoa C. elegans NP_001020979 NP_492674 NP_503124 NP_500523 NP_501127
Eukarya viridiplantae A. lyrata XP_002873017 XP_002894293 XP_020875354 XP_020873010 XP_020884300
Eukarya viridiplantae B. napus XP_013647283 XP_013640697 AGA20262 XP_013656472 XP_013681133
Eukarya viridiplantae O. tauri XP_003075327 XP_003081423 XP_022840751 XP_022839775 XP_022840814
Eukarya Amoebozoa D. discoideum XP_636333 XP_642389 XP_647593 XP_641735 XP_642724
Eukarya Amoebozoa D. purpureum XP_003293052 XP_003293647 XP_003293106 XP_003285719 XP_003284018
Eukarya Amoebozoa A. subglobosum XP_012756463 XP_012752660 XP_012756585 XP_012756853 XP_012752065
Eukarya Alveolata P. falciparum 3D7 KNC37820 ETW19449 XP_001351220 XP_001351252 XP_001350009
Eukarya Alveolata P. vivax KMZ83875 SGX75114 XP_001614562 XP_001614530 XP_001614080
Eukarya Alveolata T. equi strain WA XP_004828897 XP_004828862 XP_004829399 XP_004831990 XP_004830926
Eukarya Alveolata P. marinus XP_002774114 XP_002774250 XP_002774365 XP_002767562 XP_002778409
Eukarya Alveolata C. muris RN66 XP_002140608 XP_002139632 XP_002140059 XP_002141559 XP_002142344
Eukarya Excavata T. cruzi cruzi PBJ71163 PBJ71163 PBJ73557 PBJ81421 PBJ72541
Eukarya Excavata T. rangeli RNF00410 RNF00410 RNF02202 RNF07318 RNF04215
Eukarya Excavata L. mexicana XP_003875466 XP_003875466 XP_003876737 XP_003877779 XP_003878621
Eukarya Excavata L. seymouri KPI83387 KPI83387 KPI89876 KPI84927 KPI86235
Eukarya Excavata B. saltans CUG90421 CUG90421 CUF95139 CUI14899 CUI14455
Mimiviridae Klosneuvirinae Catovirus CTV1 ARF09224 ARF09224 ARF09024 ARF09013− 20
Mimiviridae Klosneuvirinae Klosneuvirus KNV1 ARF11732 ARF11732 ARF11337 ARF11340− 43
Mimiviridae Klosneuvirinae Indivirus ILV1 ARF09638 ARF09638 ARF09452 ARF09455
Mimiviridae Klosneuvirinae Bodo saltans virus ATZ80933 ATZ80933 ATZ80516 ATZ80519
Mimiviridae Mimivirinae Acanthamoeba polyphaga mimivirus AEJ34618 AEJ34618 AKI79272 YP_003987013
Mimiviridae Mimivirinae Moumouvirus australiensis AVL94825 AVL94825 AVL94704 AVL94698 AVL94696
Mimiviridae Mimivirinae Powai lake megavirus ANB50623 ANB50623 ANB50499 ANB50494 ANB50492
Mimiviridae Mimivirinae Tupanvirus deep ocean AUL79325 AUL79325 AUL79602 AUL79608
Mimiviridae Mimivirinae Tupanvirus soda lake AUL78031 AUL78031 AUL78296 AUL78302
Mimiviridae Mimivirinae Acanthamoeba polyphaga moumouvirus YP_007354410 YP_007354410 YP_007354285 YP_007354277
Mimiviridae Mesomimivirinae Chrysochromulina ericina virus YP_009173557 YP_009173557 YP_009173322 YP_009173653
Mimiviridae Mesomimivirinae Phaeocystis globosa virus YP_008052553 YP_008052553 YP_008052407 YP_008052581
Mimiviridae Mesomimivirinae Tetraselmis virus 1 AUF82182 AUF82182 AUF82209 AUF82600
Mimiviridae CroV Cafeteria roenbergensis virus BV-PW1 YP_003969844 YP_003969844 YP_003969852 YP_003970001
Archaea Crenarchaeota Saccharolobus solfataricus WP_009990476
Archaea Crenarchaeota Sulfolobus acidocaldarius WP_011277574
Archaea Asgardarchaea Candidatus Odinarchaeota archaeon LCB_4 OLS17382
Archaea Euryarchaeota Pyrococcus furiosus WP_014835440

long-branch attraction, Trypanosoma cruzi cruzi and Trypanosoma rangeli RNAP-II subunit possesses a carboxy-terminal domain (CTD) consisting of
were chosen as close relatives whilst Leishmania mexicana, Leptomonas a heptapeptide repeat region that is involved in mRNA processing
seymouri and Bodo saltans were chosen as increasingly distantly related including capping, splicing and polyadenylation (McCracken et al., 1997).
members of the Excavata. The organisms listed above include members of Homologues of RPO21, the largest subunit of RNAP-II of S. cerevisiae were
all 5 or 6 major clades. In addition, complete genomes are available for identified in all eukaryotic organisms examined. With the exception of the
each of the organisms listed, ensuring that the phylogenetic trees included members of the Excavata and Perkinus marinus, a CTD heptapeptide repeat
exactly the same organisms. region was readily identified in all RNAP-II subunits used in the phyloge­
netic analysis. Although the heptapeptide repeat is absent from the Exca­
4.1.2. Mimiviridae vata studied, Trypanosoma RNAP-II genes are known to possesses a
Only members of the Mimiviridae containing clear homologues to non-canonical C-terminal extension (Smith et al., 1989). As a result, the
RNAP, GTase, MTase and eIF4E were chosen for analysis. Based on well characterised Trypanosoma RNAP-II was used to identify RNAP-II
phylogenetic analysis by Claverie and Abergel, 2018, the following vi­ homologues in the Excavata clade. In the Mimiviridae only one homo­
ruses were chosen to represent three informal groupings of the Mim­ logue of the largest subunit of RNAP-II was detected.
iviridae. Mesomimivirinae: Tetraselmis virus, Chrysochromomulina ericina
virus and Phaeocystis globosa virus. Klosnuevirinae: Klosneuvirus, Catovi­ 4.2.2. GTase and MTase
rus, Indivirus and Bodo saltans virus. Megavirinae: Acanthamoeba poly­ Although three enzymatic functions are universally required to
phaga mimivirus, Powai lake megavirus, Moumouvirus australiensis, produced capped mRNA (Kyrieleis et al., 2014), only the GTase and
Acanthamoeba polyphaga moumouvirus, Tupanvirus deep ocean and MTase are monophyletic in eukaryotes, with the TPase apparently
Tupanvirus soda lake. Cafeteria roenbergensis virus (CroV) is basal to the originating from two independent sources (Ramanathan et al., 2016;
Klosenuvirinae and Megavirinae and does not appear to have other close Kyrieleis et al., 2014). In S. cerevisiae and most other unicellular eu­
relatives available yet. karyotes such as Alveolata all three functions are encoded by separate
genes. In both Holozoa and Viridiplantae the TPase and GTase are
4.2. Choice of sequences encoded in the same polypeptide. In Excavata, two capping complexes
are present (Takagi et al., 2007). Of these, the gene encoding both the
4.2.1. RNAP subunits GTase and MTase in the same polypeptide is essential for growth and
Of the three RNA polymerases in eukaryotes, the RNAP-II is the one adding the m7G cap and was thus chosen for phylogenetic analysis
intimately associated with the synthesis of capped mRNA. The largest (Takagi et al., 2007). In the Mimiviridae, all three functions are present in

10
P.J.L. Bell Virus Research 289 (2020) 198168

the same polypeptide. Ghigo, E., Colson, P., Levasseur, A., Kroemer, G., Raoult, D., La Scola, B., 2018.
Tailed giant Tupanvirus possesses the most complete translational apparatus of the
known virosphere. Nat. Commun. 9, 749. https://doi.org/10.1038/s41467-018-
4.2.3. eIF4E 03168-1.
Although in the yeast Saccharomyces cerevisiae, there is only one Al-Shayeb, B., et al., 2020. Clades of huge phage from across Earth’s ecosystems. Nature
eIF4E gene, the core role of eIF4E in protein translation has meant that 578 (7795), 425–431. https://doi.org/10.1038/s41586-020-2007-4.
Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.,
in higher eukaryotes several paralogous eIF4E genes have evolved that et al., 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database
encode distinctly featured proteins. In addition to regular translation search programs. Nucleic Acid Res. https://doi.org/10.1093/nar/25.17.3389.
initiation, these paralogues are involved in the preferential translation Bäckström, D., Yutin, N., Jørgensen, S.L., Dharamshi, J., Homa, F., Zaremba-
Niedwiedzka, K., Spang, A., Wolf, Y.I., Koonin, E.V., Ettema, T.J.G., 2019. Virus
of specific mRNAs or are tissue and/or developmental stage specific. For genomes from deep sea sediments expand the ocean megavirome and support
example, eight such genes have been found in Drosophila and five in independent origins of viral gigantism. mBio 10 (2). https://doi.org/10.1128/
Caenorhabditis (Frydryskova et al., 2016). In humans where there are mBio.02497-18 e02497-18.
Battistuzzi, F.U., Feijao, A., Hedges, S.B., 2004. A genomic timescale of prokaryote
also multiple paralogues, the three isoforms of eIF4E1 bring the mRNAs evolution: insights into the origin of methanogenesis, phototrophy, and the
to the ribosome via an interaction with scaffold protein eIF4G (Fry­ colonization of land. BMC Evol. Biol. 4, 44. https://doi.org/10.1186/1471-2148-4-44.
dryskova et al., 2016). As a result, in this study the human eIF4E1 iso­ Bell, G., 1982. The Masterpiece of Nature: The Evolution and Genetics of Sexuality.
Croom Helm, London p19.
form was used to conduct blast searches of Holozoa, and the hits with the Bell, P.J., 2001. Viral eukaryogenesis: was the ancestor of the nucleus a complex DNA
highest blast score were taken for phylogenetic analysis. In Arabidopsis, virus? J. Mol. Evol. 53 (3), 251–256. https://doi.org/10.1007/s002390010215.
the EIF4E1 is expressed in all tissues except in the cells of the speciali­ Bell, P.J., 2006. Sex and the eukaryotic cell cycle is consistent with a viral ancestry for
the eukaryotic nucleus. J. Theor. Biol. 243 (1), 54–63. https://doi.org/10.1016/j.
zation zone of the roots whereas the At.EIF4E2 mRNA is particularly
jtbi.2006.05.015.
abundant in floral organs and in young developing tissues (Rodriguez Bell, P.J., 2013. Meiosis: its origin according to the viral eukaryogenesis theory. In:
et al., 1998). The Arabidopsis EIF4E1 gene was thus used in blast Bernstein, C., Bernstein, M. (Eds.), Meiosis. Intechopen, pp. 77–99.
searches of plants, and the genes with the highest homology taken for Benelli, D., Londei, P., 2011. Translation initiation in Archaea: conserved and domain-
specific features. Biochem. Soc. Trans. 39 (1), 89–93. https://doi.org/10.1042/
phylogenetic analysis. Where molecular knowledge was insufficient for BST0390089.
such rational sequence selection, the homologue with the highest ho­ Bengtson, S., Rasmussen, B., Ivarsson, M., Muhling, J., Broman, C., Marone, F.,
mology to the Saccharomyces gene was identified, and provided that the Stampanoni, M., Bekker, A., 2017. Fungus-like mycelial fossils in 2.4-billion-year-old
vesicular basalt. Nat. Ecol. Evol. 1 (6), 141. https://doi.org/10.1038/s41559-017-
gene possessed regions equivalent to the structurally important regions 0141.
that bind to the m7G cap (Marcotrigiano et al., 1997), this gene was used Ceyssens, P.J., Minakhin, L., Van den Bossche, A., Yakunina, M., Klimuk, E., Blasdel, B.,
to identify the closest homologues within the supergroup. Except for the De Smet, J., Noben, J.P., Bläsi, U., Severinov, K., Lavigne, R., 2014. Development of
giant bacteriophage ϕKZ is independent of the host transcription apparatus. J. Virol.
Tupanviruses, the Mimiviridae were found to encode only one eIF4E (18), 10501–10510. https://doi.org/10.1128/JVI.01347-14.
homologue. In the case of the Tupanviruses, two eIF4E homologues were Chaikeeratisak, V., Nguyen, K., Egan, M.E., Erb, M.L., Vavilina, A., Pogliano, J., 2017a.
identified. In this case, only one of the homologues was included in the The phage nucleus and tubulin spindle are conserved among large Pseudomonas
phages. Cell Rep. 20 (7), 1563–1571. https://doi.org/10.1016/j.celrep.2017.07.064.
phylogenetic analysis. The homologue with the highest homology to the Chaikeeratisak, V., Nguyen, K., Khanna, K., Brilot, A.F., Erb, M.L., Coker, J.K.,
Mimivirus homologue was used in both cases. Vavilina, A., Newton, G.L., Buschauer, R., Pogliano, K., Villa, E., Agard, D.A.,
Pogliano, J., 2017b. Assembly of a nucleus-like structure during viral replication in
bacteria. Science 355 (6321), 194–197. https://doi.org/10.1126/science.aal2130.
4.3. Phylogenetic analysis Chen, L.-X., Anantharaman, K., Shaiber, A., Eren, A.M., Banfield, J.F., 2020. Accurate
and complete genomes from metagenomes. Genome Res. 30 (3), 315–333. https://
doi.org/10.1101/gr.258640.119.
Homology searches were carried out using the BLASTp and psi- Claverie, J.M., Abergel, C., 2018. Mimiviridae: an expanding family of highly diverse
BLAST algorithms (Altschul et al., 1997). MEGA7 (Kumar et al., large dsDNA viruses infecting a wide phylogenetic range of aquatic eukaryotes.
2016) was used for all phylogenetic analysis. The accession numbers Viruses 10 (9), 506. https://doi.org/10.3390/v10090506.
Da Cunha, V., Gaia, M., Nasir, A., Forterre, P., 2018. Asgard archaea do not close the
for all genes used in this study are shown in Table 3. Unless otherwise
debate about the universal tree of life topology. PLoS Genet. 14 (3), e1007215.
stated all program parameters for homology searching and domain https://doi.org/10.1371/journal.pgen.1007215.
identification were left at their respective defaults. Protein alignments El Albani, A., Bengtson, S., Canfield, D.E., Bekker, A., Macchiarelli, R., Mazurier, A.,
Hammarlund, E.U., Boulvais, P., Dupuy, J.-J., Fontaine, C., Fürsich, F.T., Gauthier-
were performed using MUSCLE. Once alignments were completed for
Lafaye, G., Janvier, P., Javaux, E., Ossa, F.O., Pierson-Wickmann, A.-C.,
all organisms for a specific alignment, the alignments were trimmed Riboulleau, A., Sardini, P., Vachard, D., Whitehouse, M., Meunier, A., 2010. Large
and used for tree construction. The evolutionary histories were colonial organisms with coordinated growth in oxygenated environments 2.1 gyr
inferred by using the Maximum Likelihood method based on the JTT ago. Nature 466 (7302), 100–104. https://doi.org/10.1038/nature09166.
Eme, L., Spang, A., Lombard, J., Stairs, C.W., Ettema, T.J.G., 2017. Archaea and the
matrix-based model (Jones et al., 1992). All bootstrap consensus trees origin of eukaryotes. Nat. Rev. Microbiol. 15 (12), 711–723. https://doi.org/
were inferred from 1000 replicates (Felsenstein, 1985) and is taken to 10.1038/nrmicro.2017.133.
represent the evolutionary history of the taxa analyzed (Felsenstein, Felsenstein, J., 1985. Confidence limits on phylogenies: an approach using the bootstrap.
Evolution 39, 783–791. https://doi.org/10.1111/j.1558-5646.1985.tb00420.x.
1985). Branches corresponding to partitions reproduced in less than Forterre, P., Prangishvili, D., 2009. The great billion year war between ribosome-and
51 % bootstrap replicates were collapsed. The percentage of replicate capsid-encoding organisms as a major source of evolutionary novelties. Ann. N. Y.
trees in which the associated taxa clustered together in the bootstrap Acad. Sci. 1178, 65–77. https://doi.org/10.1111/j.1749-6632.2009.04993.x.
Forterre, P., Raoult, D., 2017. The transformation of a bacterium into a nucleated virocell
test (1000 replicates) are shown next to the branches (Felsenstein, reminds the viral eukaryogenesis hypothesis. Virologie 21 (4), 28–30. https://doi.
1985). Initial tree(s) for the heuristic search were obtained auto­ org/10.1684/vir.2017.0700.
matically by applying Neighbor-Join and BioNJ algorithms to a ma­ Fridmann-Sirkis, Y., Milrot, E., Mutsafi, Y., Ben-Dor, S., Levin, Y., Savidor, A.,
Kartvelishvily, E., Minsky, A., 2016. Efficiency in complexity: composition and
trix of pairwise distances estimated using a JTT model, and then
dynamic nature of Mimivirus replication factories. J. Virol. 90 (21), 10039–10047.
selecting the topology with superior log likelihood value. https://doi.org/10.1128/JVI.01319-16.
Frydryskova, K., Masek, T., Borcin, K., Mrvova, S., Venturi, V., Pospisek, M., 2016.
Distinct recruitment of human eIF4E isoforms to processing bodies and stress
Appendix A. Supplementary data granules. BMC Mol. Biol. 17 (1), 21. https://doi.org/10.1186/s12867-016-
0072-x.
Supplementary material related to this article can be found, in the Guglielmini, J., Woo, A.C., Krupovic, M., Forterre, P., Gaia, M., 2019. Diversification of
giant and large eukaryotic dsDNA viruses predated the origin of modern eukaryotes.
online version, at doi:https://doi.org/10.1016/j.virusres.2020.198168.
PNAS 116 (39), 19585–19592. https://doi.org/10.1073/pnas.1912006116.
Hampl, V., Hug, L., Leigh, J.W., Dacks, J.B., Lang, B.F., Simpson, A.G., Roger, A.J., 2009.
References Phylogenomic analyses support the monophyly of Excavata and resolve relationships
among eukaryotic “supergroups”. Proc. Natl. Acad. Sci. U. S. A. 106 (10),
3859–3864. https://doi.org/10.1073/pnas.0807880106.
Abrahão, J., Silva, L., Silva, L.S., Khalil, J.Y.B., Rodrigues, R., Arantes, T., Assis, F.,
Boratto, P., Andrade, M., Kroon, E.G., Ribeiro, B., Bergier, I., Seligmann, H.,

11
P.J.L. Bell Virus Research 289 (2020) 198168

Hendrickson, H.L., Poole, A.M., 2018. Manifold routes to a nucleus. Front. Microbiol. 9, Neumann, N., Lundin, D., Poole, A.M., 2010. Comparative genomic evidence for a
2604. https://doi.org/10.3389/fmicb.2018.02604. complete nuclear pore complex in the last eukaryotic common ancestor. PLoS One 5
Hernández, G., Proud, C.G., Preiss, T., Parsyan, A., 2012. On the diversification of the (10), e13241. https://doi.org/10.1371/journal.pone.0013241.
translation apparatus across eukaryotes. Comp. Funct. Genomics 2012, 256848. Okamura, M., Inose, H., Masuda, S., 2015. RNA export through the NPC in eukaryotes.
https://doi.org/10.1155/2012/256848. Genes (Basel) 6 (1), 124–149. https://doi.org/10.3390/genes6010124.
Imachi, H., Nobu, M.K., Nakahara, N., Morono, Y., Ogawara, M., Takaki, Y., Takano, Y., Parfrey, L.W., Lahr, D.J., Knoll, A.H., Katz, L.A., 2011. Estimating the timing of early
Uematsu, K., Ikuta, T., Ito, M., Matsui, Y., Miyazaki, M., Murata, K., Saito, Y., eukaryotic diversification with multigene molecular clocks. Proc. Natl. Acad. Sci. U.
Sakai, S., Song, C., Tasumi, E., Yamanka, Y., Yamaguchi, T., Kamagata, Y., S. A. 108 (33), 13624–13629. https://doi.org/10.1073/pnas.1110633108.
Tamaki, H., Takai, K., 2020. Isolation of an archaeon at the prokaryote-eukaryote Philippe, N., Legendre, M., Doutre, G., Couté, Y., Poirot, O., Lescot, M., Arslan, D.,
interface. Nature. https://doi.org/10.1038/s41586-019-1916-6. Seltzer, V., Bertaux, L., Bruley, C., Garin, J., Claverie, J.-M., Abergel, C., 2013.
Iranzo, J., Puigbò, P., Lobkovsky, A.E., Wolf, Y.I., Koonin, E.V., 2016. Inevitability of Pandoraviruses: amoeba viruses with genomes up to 2.5 Mb reaching that of
genetic parasites. Genome Biol. Evol. 8 (9), 2856–2869. https://doi.org/10.1093/ parasitic eukaryotes. Science 341 (6143), 281–286. https://doi.org/10.1126/
gbe/evw193. science.1239181.
Iyer, L.M., Aravind, L., Koonin, E.V., 2001. Common origin of four diverse families of Ramanathan, A., Robb, G.B., Chan, S.H., 2016. mRNA capping: biological functions and
large eukaryotic DNA viruses. J. Virol. 75 (23), 11720–11734. https://doi.org/ applications. Nucleic Acids Res. 44 (16), 7511–7526. https://doi.org/10.1093/nar/
10.1128/JVI.75.23.11720-11734.2001. gkw551.
Jagus, R., Bachvaroff, T.R., Joshi, B., Place, A.R., 2012. Diversity of eukaryotic Raoult, D., Audic, S., Robert, C., Abergel, C., Renesto, P., Ogata, H., La Scola, B.,
translational initiation factor eIF4E in protists. Comp. Funct. Genomics 2012, Suzan, M., Claverie, J.M., 2004. The 1.2-megabase genome sequence of Mimivirus.
134839. https://doi.org/10.1155/2012/134839. Science 306 (5700), 1344–1350. https://doi.org/10.1126/science.1101485.
Jan, E., Mohr, I., Walsh, D., 2016. A cap-to-tail guide to mRNA translation strategies in Ribeiro, S., Golding, G.B., 1998. The mosaic nature of the eukaryotic nucleus. Mol. Biol.
virus-infected cells. Annu. Rev. Virol. 3 (1), 283–307. https://doi.org/10.1146/ Evol. 15 (7), 779–788. https://doi.org/10.1093/oxfordjournals.molbev.a025983.
annurev-virology-100114-055014. Rivera, M.C., Lake, J.A., 1992. Evidence that eukaryotes and eocyte prokaryotes are
Jones, D.T., Taylor, W.R., Thornton, J.M., 1992. The rapid generation of mutation data immediate relatives. Science 257, 74–76. https://doi.org/10.1126/science.1621096.
matrices from protein sequences. Comput. Appl. Biosci. 8, 275–282. https://doi.org/ Rodriguez, C.M., Freire, M.A., Camilleri, C., Robaglia, C., 1998. The Arabidopsis thaliana
10.1093/bioinformatics/8.3.275. cDNAs coding for eIF4E and eIF(iso)4E are not functionally equivalent for yeast
Kabachinski, G., Schwartz, T.U., 2015. The nuclear pore complex–structure and function complementation and are differentially expressed during plant development. Plant J.
at a glance. J. Cell. Sci. 128 (3), 423–429. https://doi.org/10.1242/jcs.083246. (4), 465–473. https://doi.org/10.1046/j.1365-313x.1998.00047.x.
Katahira, J., 2015. Nuclear export of messenger RNA. Genes 6 (2), 163–184. https://doi. Roger, A.J., Muñoz-Gómez, S.A., Kamikawa, R., 2017. The origin and diversification of
org/10.3390/genes6020163. mitochondria. Curr. Biol. 27 (21), R1177–R1192. https://doi.org/10.1016/j.
Knoll, A.H., 2015. Paleobiological perspectives on early microbial evolution. Cold Spring cub.2017.09.015.
Harb. Perspect. Biol. 7 (7), a018093. https://doi.org/10.1101/cshperspect.a018093. Sapp, J., 2005. The prokaryote-eukaryote dichotomy: meanings and mythology.
Koonin, E.V., 2015. Archaeal ancestors of eukaryotes: not so elusive any more. BMC Biol. Microbiol. Mol. Biol. Rev. 69 (2), 292–305. https://doi.org/10.1128/
13, 84. https://doi.org/10.1186/s12915-015-0194-5. MMBR.69.2.292-305.2005.
Koonin, E.V., Dolja, V.V., Krupovic, M., 2015. Origins and evolution of viruses of Schulz, F., Yutin, N., Ivanova, N.N., Ortega, D.R., Lee, T.K., Vierheilig, J., Daims, H.,
eukaryotes: the ultimate modularity. Virology 479–480, 2–25. https://doi.org/ Horn, M., Wagner, M., Jensen, G.J., Kyrpides, N.C., Koonin, E.V., Woyke, T., 2017.
10.1016/j.virol.2015.02.039. Giant viruses with an expanded complement of translation system components.
Koonin, E.V., Wolf, Y.I., Katsnelson, M.I., 2017. Inevitability of the emergence and Science 356 (6333), 82–85. https://doi.org/10.1126/science.aal4657.
persistence of genetic parasites caused by evolutionary instability of parasite-free Sentenac, A., 1985. Eukaryotic RNA polymerases. Crit. Rev. Biochem. Mol. Biol. 18 (1),
states. Biol. Direct 12 (1), 31. https://doi.org/10.1186/s13062-017-0202-5. 31–90. https://doi.org/10.3109/10409238509082539.
Kumar, S., Stecher, G., Tamura, K., 2016. MEGA7: molecular evolutionary genetics Shuman, S., Schwer, B., 1995. RNA capping enzyme and DNA ligase: a superfamily of
analysis version 7.0 for bigger datasets. Mol. Biol. Evol. 33, 1870–1874. https://doi. covalent nucleotidyl transferases. Mol. Microbiol. 17 (3), 405–410. https://doi.org/
org/10.1093/molbev/msw054. 10.1111/j.1365-2958.1995.mmi_17030405.x.
Kyrieleis, O.J., Chang, J., de la Peña, M., Shuman, S., Cusack, S., 2014. Crystal structure Smith, J.L., Levin, J.R., Ingles, C.J., Agabian, N., 1989. In trypanosomes the homolog of
of vaccinia virus mRNA capping enzyme provides insights into the mechanism and the largest subunit of RNA polymerase II is encoded by two genes and has a highly
evolution of the capping apparatus. Structure 22 (3), 452–465. https://doi.org/ unusual C-terminal domain structure. Cell 56 (5), 815–827. https://doi.org/
10.1016/j.str.2013.12.014. 10.1016/0092-8674(89)90686-7.
Lang, B.F., Gray, M.W., Burger, G., 1999. Mitochondrial genome evolution and the origin Spang, A., Saw, J.H., Jørgensen, S.L., Zaremba-Niedzwiedzka, K., Martijn, J., Lind, A.E.,
of eukaryotes. Annu. Rev. Genet. 33, 351–397. https://doi.org/10.1146/annurev. van Eijk, R., Schleper, C., Guy, L., Ettema, T.J.G., 2015. Complex archaea that bridge
genet.33.1.351. the gap between prokaryotes and eukaryotes. Nature 521, 173–179. https://doi.org/
Marcotrigiano, J., Gingras, A.C., Sonenberg, N., Burley, S.K., 1997. Co-crystal structure 10.1038/nature14447.
of the messenger RNA 5’ cap-binding protein (eIF4E) bound to 7-methyl-GDP. Cell Spang, A., Eme, L., Saw, J.H., Caceres, E.F., Zaremba-Niedzwiedzka, K., Lombard, J.,
89 (6), 951–961. https://doi.org/10.1016/s0092-8674(00)80280-9. Guy, L., Ettema, T.J.G., 2018. Asgard archaea are the closest prokaryotic relatives of
Margulis, L., 1975. Symbiotic theory of the origin of eukaryotic organelles; criteria for eukaryotes. PLoS Genet. 14 (3), e1007080. https://doi.org/10.1371/journal.
proof. Symp. Soc. Exp. Biol. 29, 21–38. pgen.1007080.
Martin, W., 1999. A briefly argued case that mitochondria and plastids are descendants Stanier, R.Y., van Niel, C.B., 1962. The concept of a bacterium. Arch. Microbiol. 42, 17–35.
of endosymbionts, but that the nuclear compartment is not. Proc. Biol. Sci. 266 Starr, D.A., 2009. A nuclear-envelope bridge positions nuclei and moves chromosomes.
(1426), 1387. https://doi.org/10.1098/rspb.1999.0792. J. Cell. Sci. 122 (Pt 5), 577–586. https://doi.org/10.1242/jcs.037622.
Martin, W., 2005. Archaebacteria (Archaea) and the origin of the eukaryotic nucleus. Takagi, Y., Sindkar, S., Ekonomidis, D., Hall, M.P., Ho, C.K., 2007. Trypanosoma brucei
Curr. Opin. Microbiol. (6), 630–637. https://doi.org/10.1016/j.mib.2005.10.004. encodes a bifunctional capping enzyme essential for cap 4 formation on the spliced
McCracken, S., Fong, N., Yankulov, K., Ballantyne, S., Pan, G., Greenblatt, J., leader RNA. J. Biol. Chem. 282 (22), 15995–16005. https://doi.org/10.1074/jbc.
Patterson, S.D., Wickens, M., Bentley, D.L., 1997. The C-terminal domain of RNA M701569200.
polymerase II couples mRNA processing to transcription. Nature 385 (6614), Takemura, M., 2001. Poxviruses and the origin of the eukaryotic nucleus. J. Mol. Evol. 52
357–361. https://doi.org/10.1101/gad.200303.112. (5), 419–425. https://doi.org/10.1007/s002390010171.
McInerney, J.O., O’Connell, M.J., Pisani, D., 2014. The hybrid nature of the Eukaryota Werner, F., 2007. Structure and function of archaeal RNA polymerases. Mol. Microbiol.
and a consilient view of life on Earth. Nat. Rev. Microbiol. 12 (6), 449–455. https:// 65 (6), 1395–1404. https://doi.org/10.1111/j.1365-2958.2007.05876.x.
doi.org/10.1038/nrmicro3271. Woese, C.R., Kandler, O., Wheelis, M.L., 1990. Towards a natural system of organisms:
Mendoza, S.D., Nieweglowska, E.S., Govindarajan, S., Leon, L.M., Berry, J.D., Tiwari, A., proposal for the domains Archaea, Bacteria, and Eucarya. Proc. Natl. Acad. Sci. U. S.
Chaikeeratisak, V., Pogliano, J., Agard, D.A., Bondy-Denomy, J., 2020. A. 87 (12), 4576–4579. https://doi.org/10.1073/pnas.87.12.4576.
A baceriophage nucleus-like compartment shields DNA from CRISPR nucleases. Woo, C., Gaia, M., Guglielmini, J., Da Cunha, V., Forterre, P., 2019. Evolution of the
Nature 577 (7789), 244–248. https://doi.org/10.1038/s41586-019-1786-y. PRD1-adenovirus lineage: a viral tree of life incongruent with the cellular universal
Moniruzzaman, M., Martinez-Gutierrez, C.A., Weinheimer, A.R., Aylward, F.O., 2020. tree of life. BioRXiv. https://doi.org/10.1101/741942.
Dynamic genome evolution and complex virocell metabolism of globally-distributed Yutin, N., Koonin, E.V., 2012. Hidden evolutionary complexity of Nucleo-Cytoplasmic
giant viruses. Nat. Commun. 11 (1), 1710. https://doi.org/10.1038/s41467-020- Large DNA viruses of eukaryotes. Virol. J. 9, 161. https://doi.org/10.1186/1743-
15507-2. 422X-9-161.
Mutsafi, Y., Zauberman, N., Sabanay, I., Minsky, A., 2010. Vaccinia-like cytoplasmic Zauberman, N., Mutsafi, Y., Halevy, D.B., Shimoni, E., Klein, E., Xiao, C., Sun, S., Minsky, A.,
replication of the giant Mimivirus. Proc. Natl. Acad. Sci. U. S. A. 107 (13), 2008. Distinct DNA exit and packaging portals in the virus Acanthamoeba polyphaga
5978–5982. https://doi.org/10.1073/pnas.0912737107. mimivirus. PLoS Biol. 6 (5), e114. https://doi.org/10.1371/journal.pbio.0060114.

12

You might also like