J JMB 2008 02 018 PDF

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

doi:10.1016/j.jmb.2008.02.018 J. Mol. Biol.

(2008) 378, 468–480

Available online at www.sciencedirect.com

Genome Comparison and Proteomic Characterization


of Thermus thermophilus Bacteriophages P23-45 and
P74-26: Siphoviruses with Triplex-forming Sequences
and the Longest Known Tails
Leonid Minakhin 1 ⁎, Manisha Goel 2 , Zhanna Berdygulova 1 ,
Erlan Ramanculov 3 , Laurence Florens 2 , Galina Glazko 4 ,
Valeri N. Karamychev 5 , Alexei I. Slesarev 5 , Sergei A. Kozyavkin 5 ,
Igor Khromov 6 , Hans-W. Ackermann 7 , Michael Washburn 2 ,
Arcady Mushegian 2,8 and Konstantin Severinov 1,6,9 ⁎

1
Waksman Institute for The genomes of two closely related lytic Thermus thermophilus siphoviruses
Microbiology, 190 with exceptionally long (∼800 nm) tails, bacteriophages P23-45 and P74-26,
Frelinghuysen Road, were sequenced completely. The P23-45 genome consists of 84,201 bp with
Piscataway, NJ 08854, USA 117 putative open reading frames (ORFs), and the P74-26 genome has
2 83,319 bp and 116 putative ORFs. The two genomes are 92% identical with
Stowers Institute for Medical
113 ORFs shared. Only 25% of phage gene product functions can be
Research, Kansas City, predicted from similarities to proteins and protein domains with known
MO 64110, USA
functions. The structural genes of P23-45, most of which have no similarity
3
National Center for to sequences from public databases, were identified by mass spectrometric
Biotechnology of Republic analysis of virions. An unusual feature of the P23-45 and P74-26 genomes is
of Kazakhstan, Valikhanova 43, the presence, in their largest intergenic regions, of long polypurine-
Astana 010000 Kazakhstan polypyrimidine (R-Y) sequences with mirror repeat symmetry. Such
4 sequences, abundant in eukaryotic genomes but rare in prokaryotes, are
Department of Biostatistics known to form stable triple helices that block replication and transcription
and Computational Biology, and induce genetic instability. Comparative analysis of the two phage
University of Rochester genomes shows that the area around the triplex-forming elements is
Medical Center, Rochester, enriched in mutational variations. In vitro, phage R-Y sequences form trip-
NY 14642, USA lexes and block DNA synthesis by Taq DNA polymerase in orientation-
5 dependent manner, suggesting that they may play a regulatory role during
Fidelity Systems, Inc.,
Gaithersburg, MD 20879, USA P23-45 and P74-26 development.
6
© 2008 Elsevier Ltd. All rights reserved.
Institute of Molecular Genetics,
Russian Academy of Sciences,
Moscow, 123182 Russia
7
Félix d'Herelle Reference
Center for Bacterial Viruses,
Faculty of Medicine, Laval
University, Quebec, Qc, Canada
G1K 7P4
8
Department of Microbiology,
Kansas University Medical
Center, Kansas City KS 66160,
USA

*Corresponding authors. E-mail addresses: minakhin@waksman.rutgers.edu; severik@waksman.rutgers.edu.


Abbreviations used: ORF, open reading frame; MudPIT, multidimensional protein identification technology; TMP, tail
tape-measure protein; NSAF, normalized spectral abundance factor; MR, mirror repeat.

0022-2836/$ - see front matter © 2008 Elsevier Ltd. All rights reserved.
Siphoviruses with Triplex-forming Sequences 469

9
Department of Molecular
Biology and Biochemistry,
Rutgers, The State University
of New Jersey, Piscataway,
NJ 08854, USA

Received 3 December 2007;


received in revised form
24 January 2008;
accepted 11 February 2008
Available online
15 February 2008
Keywords: Thermus thermophilus; thermophages; virion proteomics; bioinfor-
Edited by M. Gottesman matics; triplex-forming sequence

Introduction thermophages, P23-45 and P74-26, stand out from


other siphoviruses because of their extremely long
Bacteriophages are the most abundant and diverse (∼800 nm) tails. These phages were isolated from hot
form of life on Earth and may exert major influence springs in geographically close, yet distinct (∼30 km
over the microbial world.1,2 Comparative viral ge- apart) areas of the Kamchatka peninsula in the Far
nomics has already provided important insights into East, the Dolina Geyser valley (P23-45) and the Uzon
diversity and evolution of phage genomes and the valley (P74-26).6 Limited restriction digest patterns
functions of various phage genes. To date, more than of their DNA indicated that the two phages are
400 phage genomes have been sequenced comple- closely related.6 To learn more about siphoviruses
tely (NCBI, National Center for Biotechnology Infor- infecting Thermus and to gain evolutionary insight
mation – July 2007). However, phages that infect through comparative genomic analysis, we deter-
thermophilic eubacteria have remained mostly mined the complete genome sequences of P23-45 and
unexplored. Genomes of only a few of such phages P74-26, and analyzed the two genomes.
have been sequenced completely (NC_004735;
NC_004462).3 Available genomic data indicate that
phages infecting thermophilic bacteria may be
Results
substantially different from other known phages.
The lack of similarity complicates functional annota- Comparison of the P23-45 and P74-26 genomes
tions of thermophilic phage genomes. Very little is
known about strategies used by these phages to The sequences of the P23-45 and P74-26 genomes
regulate their gene expression as well as gene ex- (84,201 bp and 83,319 bp, respectively) were deter-
pression of their hosts during the infection process. mined using Fimer technology and assembled using
The only phage infecting thermophilic bacteria that the phredPhrap package (see Materials and Me-
has been characterized at the molecular level is thods). The sequences of the genomes appeared to
ϕYS40, a myovirus that infects Thermus thermophi- be almost identical (92.2% identity in global align-
lus.3,4 The functional analysis of the ϕYS40 genome ment using the Stretcher program from EMBOSS
and its gene expression strategy has revealed a package.9,10 The C + G content for both phages
wealth of new data about transcription and trans- (57.8%) is significantly lower than that of their host,
lation regulation in Thermus,5 indicating that further T. thermophilus (69.4%), but is still in stark contrast to
studies of phages infecting thermophilic bacteria are ϕYS40, a Thermus myophage that is A + T-rich (a C +
warranted. G content of 32.6%).3 While both P23-45 and P74-26
A collection of phages infecting Thermus was re- genomes encode putative DNA methylases (open
ported recently.6 Here, these phages will be referred reading frame 14 (ORF14), see below), they are sus-
to as thermophages. Among the 115 thermophage ceptible to several methylation-sensitive restriction
isolates in the collection, four major morphological endonucleases (data not shown), suggesting that
types of bacteriophages were observed. However, no phage DNA is not extensively methylated. No phage
molecular characterization of these phages was un- DNA ends was identified during genome sequen-
dertaken. The thermophage collection contained 11 cing and assembly stages.
siphoviruses (phages with long non-contractile tails). A total of 117 ORFs longer than 30 codons were
While siphoviruses form the most abundant,7 and, in predicted in the P23-45 genome using the Gene-
terms of genomic sequences (NCBI, July 2007) and Mark.hmm 2.0/VIOLIN program,11 and 116 ORFs
functionally annotated proteins,8 best-characterized were predicted in the P74-26 genome. The non-
morphological group of phages, no genome of coding regions were scanned for the presence of
phages from the Siphoviridae family that infect the additional genes by searching GenBank, GenPept,
Thermus group of bacteria has been reported. Two and the database of unfinished microbial genomes
470 Siphoviruses with Triplex-forming Sequences

at NCBI but no additional ORF was found. A search of their genomes. The length of predicted P23-45
by the tRNAscan-SE program did not reveal any ORFs varies between 30 and 5002 codons, and that of
tRNA-encoding genes.12 P74-26 between 33 and 5006 codons. A 102 bp long
In each phage, almost half of the genome is trans- ORF79 (ORF78 in P74-26) is located within ORF78
cribed in the same direction (leftward here) and (ORF77 in P74-26). ORF79 does not have a good SD
form a single cluster at the left arm of the genome; sequence and may not be a real ORF. Other general
the remaining genes are transcribed in the rightward features of the two genomes are summarized in
direction and also form a cluster at the right arm of Table 1. The longest non-coding regions are located
the genome. Also, one ORF from the left arm, ORF5, between ORF68 and ORF69 (P23-45) and ORF65 and
is transcribed in the rightward direction. The P23-45 ORF66 (P74-26), and contain unusual sequences that
genome lacks ORFs corresponding to P74-26 ORFs potentially can form DNA triplexes (see below).
60, 72, and 73. Conversely, the P74-26 genome has
no equivalents corresponding to P23-45 ORFs 16, 22, Comparative sequence analysis of P23-45 and
59, and 67 (Supplementary Data Table 1 and Fig. 1). P74-26 ORFs
None of these ORFs codes for proteins with known
functions and/or structural motifs, except for P74-26 P23-45 and P74-26 share 113 ORFs. The average
ORF60, which encodes a protein with WD40-like rates of non-synonymous and synonymous nucleo-
repeats (Supplementary Data Table 1). The products tide substitutions over the 113 genes are low
of these ORFs must be dispensable for phage (K̂ a = 0.039 ± 0.09 and K̂ s = 0.177 ± 0.29, for non-
development, though they may contribute to differ- synonymous and synonymous nucleotide substitu-
ent host ranges exhibited by the two thermophages.6 tions, respectively). There are 13 orthologs that have
Similar to other bacteriophage genomes, the Ka N 0.1; two of them have more than 50% difference
coding density of the P23-45 and P74-26 genomes is at the amino acid level (ORFs 74 and 71 in P23-45
high. Potentially coding sequences comprise ∼96% and ORFs 75 and 74 in P74-26, correspondingly).

Fig. 1. (a) A representation of the P23-45 and P74-26 phage genomes. Predicted ORFs are indicated by arrows. The end
points of the genomes are selected to separate large groups of genes transcribed from opposite DNA strands. The
direction of the arrows indicates the direction of transcription. Green, magenta, blue and yellow show functions assigned
from amino acid similarity with proteins or protein domains in the database; red shows ORFs that are absent from the
other genome (See Supplementary Data Table 1 for details). The numbers in parentheses next to the predicted gene
products represent their numbers for P23-45 shown in Data Supplementary Table 1. (b) Overview of global P23-45 and
P74-26 genomic alignment. White, identical regions; blue, regions with substitutions; red, deletions.
Siphoviruses with Triplex-forming Sequences 471

Table 1. General features of the P23-45 and P74-26 Predicted P23-45 and P74-26 proteins
genomes
The deduced protein sequences of predicted ORFs
Feature P23-45 P74-26 were compared to proteins in the non-redundant
Total number of predicted ORFs 117 116 NCBI database using the PSI-BLAST program with
Smallest ORF length (codons) 30 33 a slightly relaxed cut-off for profile inclusion (-h
Longest ORF length (codons) 5002 5006
Longest non-coding region (bp) 847 887
parameter set at 0.01). Because of the high degree of
Number of overlaps between ORFs 47 41 similarity between the two phages, the analysis
Length of overlaps (bp) 1–46 1–44 below is focused on phage P23-45 (see Supplemen-
Start codon usage AUG (96) AUG(93) tary Data Table 1 for comparison with P74-26, where
GUG (13) GUG(13) corresponding genes of both phages are listed).
UUG (8) UUG(10)
Stop codon usage TGA (36) TGA (37) About 25% of P23-45 gene products exhibit se-
TAG (40) TAG (40) quence similarity to proteins of known function or
TAA (41) TAA (39) contain predicted conserved domains, which is
somewhat lower than the average number for the
large sample of phages in public databases.8 A si-
milar situation is found in the case of unrelated
Almost all ORFs with Ka N 0.1 have no homologs in
Thermus phage ϕYS40 that we analyzed earlier.3
public databases. Interestingly, nine out of these 13
Three predicted ORFs (103, 105, and 106) with
divergent ORFs are clustered in the genome (ORFs
unknown functions have significant sequence si-
62, 65, 66, 68, 70, and 72–75 in P23-45). This region of
milarities with hypothetical proteins from a small
the genome also contains a long intergenic spacer
(19,603 bp genome) T. aquaticus phage IN93
with triplex-forming sequences as well as the largest (NC_004462) and one deduced P23-45 protein, the
number of indels between the two genomes (Fig. 1,
product of ORF112, has an ORF from ϕYS40 as its
see also below).
BLAST best match (see Supplementary Data Table 1).
In order to study codon usage variations between
The result suggests that diverse Thermus phages
the phages and the host, we calculated the average
have access to a common gene pool, a situation that
relative synonymous codon usage (RSCU) in P23-45,
has been observed for other phages.17–19 Since three
P74-26, and T. thermophilus. RSCU is defined as the
deduced P23-45 gene products are most similar to
ratio of observed codon frequency to codon fre-
proteins from mesophilic phages (Supplementary
quency expected if all the synonymous codons for
Data Table 1), the gene pool accessible to thermo-
each amino acid were used with equal frequency.13
phages is not limited to thermophilic organisms.
RSCU values would be close to 1 in the absence of
Six more ORFs have proteins from other thermo-
codon usage bias. It is known that codon usage
philic bacteria as their best BLAST matches, includ-
patterns in some phages are phage- but not host-
ing two gene products from T. thermophilus. Thus, at
specific,14,15 a property that was proposed to be least ten of 34 functionally annotated P23-45 ORFs
correlated with the presence of phage-encoded DNA
(∼30%) have proteins from thermophilic organisms
polymerases.14 Though P23-45 and P74-26 code for
as their nearest database neighbors, which is much
their own DNA polymerase (ORF11 in both gen-
larger than the corresponding number for the ϕYS40
omes, see below) the RSCU values for both phages
phage (b10%).3
are similar to those of T. thermophilus (Supplemen-
The predicted P23-45 ORFs show limited gene
tary Data Table 2).
order conservation; for example, ORFs 103, 105, and
The P23-45 and P74-26 genomes encode four and
106 match three adjacent ORFs (22, 23, and 24) of
three ORFs, respectively, that are absent from the
phage IN93, while ORFs 108 and 111 show simla-
other genome (Supplementary Data Table 1). Re-
rities with adjacent IN93 ORFs 9 and 10. Such con-
cently acquired ORFs are expected to have atypical
servation of gene order is consistent with modular
codon usage and compositional bias.16 We calcu-
phage evolution.19,20
lated individual Nc values (effective number of
codons) for 117 P23-45 ORFs and 116 P74-26 ORFs,
using CodonW software† (data not shown). Proteins involved in DNA metabolism
In extremely biased ORFs, Nc can approach 20,
while in unbiased ORFs it will approach 61. The Like many other large bacteriophages, P23-45 and
P74-26 encode a group of proteins that appear to be
average Nc was about 46.1 ± 7 in both genomes.
involved in nucleotide metabolism, including ribo-
Among four orphan genes of the P23-45 phage and
three orphans of P74-26, only ORF67 of P23-45 nucleoside-triphosphate reductase (ORF13), thymidi-
late synthase (ORF24), dNMP kinase (ORF28), dUTP
exhibited an extreme Nc value of 31.6 (Nc values for
diphosphatase (ORF29), and dCTP deaminase (ORF32).
other orphans were close to 50). This may be an
Remarkably, these proteins share stronger sequence
indication that most of the orphan ORFs were gene-
similarity with homologs from bacteria (both Gram-
rated through the loss of their counterparts in
positive and Gram-negative) and Archaea rather
another genome rather than by gene acquisition.
than with phage proteins. The corresponding genes
are clustered in the left arm of the genome (Fig. 1
and Supplementary Data Table 1) and expression of
† www.molbiol.ox.ac.uk/cu some, or perhaps all of them, may be co-regulated.
472 Siphoviruses with Triplex-forming Sequences

Proteins involved in DNA replication, recombination,


and transcription
P23-45 and P74-26 encode a set of proteins re-
quired for DNA replication; namely, a type A DNA
polymerase (ORF11), a putative small subunit of
eukaryotic-type DNA primase (ORF40), and repli-
cative helicase DnaB (ORF46). They also encode at
least three recombination proteins; an integrase
(ORF5), a RecB family exonuclease (ORF9), and a
protein homologous to Holliday junction resolvase
of the archaeal type (ORF44). The ORF5 integrase is
related to site-specific tyrosine recombinases XerC
and XerD, which are common in eubacteria and are
involved in chromosome segregation as well as in
integration/excision of temperate phage genomes
into and out of host chromosomes.21 This may be an
indication that P23-45 and P74-26 are able to enter a
lysogenic state.
Replication, repair, and recombination genes are
scattered over the left arm in both genomes (Fig. 1
and see Supplementary Data Table 1). No ORF
coding for a DNA ligase, a large subunit of DNA
primase, single-strand DNA-binding protein, or
RNase H was found, suggesting that the corres-
ponding functions are provided by the host. Not- Fig. 2. (a) Electron micrograph of a P23-45 virion
ably, T. thermophilus encodes only bacterial DnaG- showing its extremely long flexible tail and icosahedral
type DNA primase. It is possible that phages use the capsid. The scale bar represents 100 nm. (b) Protein com-
host enzyme to prime the replication of their position of phage P23-45 in SDS-PAGE. The three major
structural proteins, gp88, 89 and 94, and the tail-tape
genomes, while a phage-encoded small subunit of measure protein, gp96, are indicated. Lane M, protein mo-
eukaryotic-type DNA primase may have another lecular mass marker.
role in phage development.
P23-45 and P74-26 do not encode an RNA poly-
merase or any recognizable σ factors and must,
therefore, rely on host RNA polymerase to tran-
rated, loaded onto triphasic microcapillary HPLC
scribe all of their genes. Interestingly, no phage- columns, eluted over several chromatography steps,
encoded transcription factor could be predicted,
and analyzed directly by tandem mass spectrome-
similar to the case of ϕYS40, where only the trans-
try.22,23 The results are presented in Fig. 3. Peptides
cription regulator ORF18 was predicted.3 from 16 predicted phage proteins were found in
infectious virion preparation (shown in red in
Supplementary Data Table 1). The virion sample
Structural proteins and protein composition of
was very pure, as shown by the fact that low levels
P23-45 virions
of only three T. thermophilus proteins were detected.
Analysis of amino acid sequences of predicted The structural proteins of the phage included gp86,
proteins did not reveal similarities with known a putative portal protein, six proteins with homo-
bacteriophage structural proteins from public data- logy to conserved hypothetical proteins and/or
bases, except for a putative portal protein ORF86, conserved domains from different prokaryotes and
which is homologous to a putative protein in Cory- phages (gp87, gp89, gp94, gp96, gp103, and gp105),
nebacterium (a likely remnant of an integrated pro- and nine additional proteins without any significant
phage, with a putative terminase gene next to it; similarity to sequences in public databases. Almost
data not shown). To identify protein composition of all P23-45 virion proteins (14 of the 16) are encoded
phage virions, P23-45 particles (Fig. 2a) were puri- by adjacent ORFs 86–105 clustered in the right arm
fied by two-step centrifugation in a CsCl gradient of the genome. These ORFs must form the cluster of
and subjected to SDS-PAGE (Fig. 2b). Three major phage late genes. The cluster of structural genes
bands of ∼17 kDa, ∼40 kDa, and ∼50 kDa, as well as appears to be discontinuous, since the products of
one band migrating significantly slower than the genes 90, 92, 95, 98, 102, and 104 were not detected
largest molecular mass marker (250 kDa), were ob- by MudPIT (even after lowering the selection criteria
served (Fig. 2b). for spectrum/peptides matches). The corresponding
The preparation of P23-45 virions was analyzed proteins may be present in the virions in very small
by multidimensional protein identification techno- amounts that prevent their detection by MudPIT, or
logy (MudPIT), a shotgun proteomics approach in may not be part of the virion. One ORF in the cluster,
which proteolytic peptides of the protein complex ORF96, encodes a protein of 5002 amino acid
under study (in our case, phage virions) are gene- residues (the corresponding protein in P74-26, the
Siphoviruses with Triplex-forming Sequences 473

Fig. 3. MudPIT analysis of P23-


45 lysates. Normalized spectral
abundance factor (NSAF) values
are shown for P23-45 structural pro-
teins detected in two replicate runs
(white and gray bars). The proteins
gp88, gp89, gp94 and gp96 are also
shown in Fig. 2b.

product of ORF95, is 5006 amino acid residues long). also have a metallopeptidase domain in a similar
To our knowledge, these proteins are among the location. We therefore hypothesize that gp96 is a
longest known bacteriophage or bacterial proteins; TMP. Though the primary sequences of TMPs are
gp96 may correspond to the high-molecular mass poorly conserved, these proteins have similar se-
band seen on the SDS-PAGE gel shown in Fig. 2b. condary structures and location of their genes on the
Both phages have extremely long tails (over 800 genome is conserved between different phages.19,24
nm;6 Fig. 2a). The N-terminal and C-terminal parts TMPs are long and their lengths are correlated
of gp96 are reminiscent of fibrillar regions present in linearly with the lengths of cognate virions tails.25,26
tail tape-measure proteins (TMPs), a diverse group TMPs are also largely α-helical. The analysis of gp96
of structural proteins that determine the length of secondary structure by a metaserver27 that com-
phage tails. The two fibrillar regions of gp96 are bines prediction of secondary structure by several
separated by a metallopeptidase-like domain (Fig. algorithms, suggests that 65.7% of gp96 amino acids
4a). Some known TMPs, for example, one from are within α-helices, 5.8% are involved in extended
Staphylococcus aureus phage phiNM3 (NC_008617), strands, and 25.7% are involved in random coils

Fig. 4. P23-45 putative tape measure protein gp96. (a) PSI-BLAST search for putative conserved domains in gp96
revealed a domain with strong similarity to proteins of the metallopeptidase family (shown as red and blue segments
between amino acid residues 3000 and 3200). (b) gp96 protein secondary structure predicted by a HNN (Hierarchical
Neural Network) method available at NPS@ server. Blue, amino acid residues involved in α-helices; red, amino acid
residues involved in extended strands; purple, amino acid residues involved in random coils.
474 Siphoviruses with Triplex-forming Sequences

(Fig. 4b). These numbers are close to predicted Triplexes affect DNA synthesis in vitro,34–37 and in
values for a well-characterized TMP gpH from vivo.38–40 We used P23-45 phage DNA as a template
phage λ (64.3%, 6.2%, and 27.1%, respectively). for a thermal cycle sequencing reaction catalyzed by
Using the equation given by Katsura and Hendrix,25 Taq DNA polymerase using oligonucleotides that
a length of the tail, controlled by gp96 should be annealed at either side of MRs as primers (the 3′ ends
∼700 nm.6 The ∼15% discrepancy of the calculated of the primers were directed towards the repeats,
(700 nm) and the observed (800 nm) numbers for Fig. 5a, sequencing primers D and R). As can be seen
P23-45 tail length may be due to the presence of from the data presented in Fig. 5b, the MRI-II region
extended strands in gp96, since the calculation acted as a unidirectional replication block: the ar-
assumes a TMP to be completely α-helical. rest was observed only when Taq polymerase
Spectral counting derived from MudPIT allows approached the center of the repeat in the purine-
one to make inferences on the relative abundance of containing template strand at 72 °C (Fig. 5a). Similar
proteins in mixtures.28,29 Three P23-45 virion pro- results were obtained when the MRI-II region was
teins, gp88, gp89, and gp94, were detected with cloned into an E. coli plasmid: robust DNA poly-
highest values normalized spectral abundance merization blockage was evident in both supercoiled
factor (NSAF, see Materials and Methods) (Fig. 3). (Fig. 5a and b) and linearized (data not shown) plas-
The calculated molecular mass of these proteins mid templates, and the sites of blockage coincided
(16.6 kDa, 46.6 kDa, and 37.9 kDa, respectively) mostly with those observed in phage DNA frag-
corresponds well to the apparent molecular mass of ments. We believe that polymerization arrests in the
three major bands observed after SDS-PAGE separa- middle of MR occur through the formation of a
tion of proteins from phage virions (Fig. 2b). triplex in which a purine template strand and a
According to the NSAF values, an approximate stoi- portion of a growing pyrimidine strand are involved
chiometry of 1 gp88 to 3 gp89 to 5 gp94 in phage (the YR.R conformation, Figs. 5b and 6b). While we
virions can be estimated. Other virion proteins were did not observe any arrest when a pyrimidine strand
detected in lower abundances and must correspond served as a template for Taq DNA polymerase (Figs.
to minor components of phage virions. 5a and 6a), strong DNA polymerization arrests were
observed at both purine and pyrimidine denatured
Unusual triplex-forming sequences in the templates at 37 °C, when Sequenase (modified T7
longest non-coding regions of both phage DNA polymerase) was used (data not shown). A
genomes similar behavior was observed earlier for triplex-
forming DNA sequences.34,35 We conclude that the
An intriguing feature of the P23-45 and P74-26 ge- MRI-II sequences form triplexes, and that these
nomes is the presence of homopurine-homopyrimi- triplexes affect DNA synthesis in an orientation-
dine (R-Y) mirror repeats (MRs) between ORFs 68 dependent manner. In order to test whether the MRI
and 69 (P23-45) and ORFs 65 and 66 (P74-26) (Figs. 1 and MRII can act separately or function as a single
and 5a). Each genome contains two non-identical sets element, the MRI region was cloned into the same
of tandemly located and partially overlapping re- plasmid. During the polymerization reaction, multi-
peats, MRI and MRII. MRI consists of two 25 nt R-Y ple stop sites were present at the center of this mirror
tracts with perfect axis symmetry; MRII consists of repeat, though the block was less efficient (compare
two 33 nt R-Y tracts with nearly perfect axis sym- polymerization efficiency on the MRI-MRII and MRI
metry (one mismatch). MRI and MRII overlap by 3 nt templates in plasmid DNA in Fig. 5b). We conclude
(Fig. 5a). MRI and MRII of P23-45 and P74-26 are that both MRI and MRII can form triplexes and block
identical. replication in vitro.
Homopurine-homopyrimidine MRs are relatively The comparison of P23-45 and P74-26 genome
abundant in eukaryotes, but are found only rarely in sequences revealed increased levels of point muta-
prokaryotic genomes.30,31 To our knowledge, the tions, deletions, and insertions around the triplex-
MRs of P23-45 and P74-26 are the longest observed to forming sequences (Fig. 1b). Notably, four of seven
date in bacteria or their viruses. The R-Y sequences ORFs that are absent from one of the two phage ge-
organized in an MR have special structural proper- nomes are located close to MRs, at a distance of less
ties and may adopt unusual triplex DNA conforma- than 4 kb. Recently, it has been shown that triplexes
tions in vitro32,33 and in vivo34 under appropriate formed by R-Y tracts cause genomic instabilities:
conditions. Two different types of DNA triplexes are point mutations, amplifications, deletions and trans-
known. At physiological pH and in the presence of locations.41–43 Thus, it seems plausible that phage
bivalent cations, a triplex can be built from a pyri- MRs could be responsible for increased variability of
midine strand and a hairpin formed by a purine genomic regions that surround them.
strand (the YR.R conformation).33 Under acidic con-
ditions, a triplex can be built from a purine strand
and a hairpin formed by a pyrimidine strand (the YR. Discussion
Y+ conformation).32 Under our experimental condi-
tions (pH 9.0 at 25 °C and the presence of Mg2+ in the Here, we report a comparative study of complete
sequencing mixture, see Materials and Methods), the genomes of two T. thermophilus dsDNA phages with
YR.R conformation appears to be more favorable siphoviral morphology, P23-45 and P74-26, and
than the alternative YR.Y+ conformation. proteomic characterization for one of them (P23-
Siphoviruses with Triplex-forming Sequences 475

Fig. 5. Triplex-forming mirror repeats MRI and MRII located in the longest non-coding region of the P23-45 genome.
(a) Schematic representation of a section of the P23-45 genome containing mirror repeats. Sequences of partially
overlapping regions MRI and MRII are shown in upper case letters and marked by horizontal arrows. Replication
blockage sites for different templates are marked by boxes: white, phage DNA; black, plasmid-encoded MRI-MRII; gray,
plasmid-encoded MRI. (b) DNA polymerization on the triplex-forming templates MRI and MRI-II of P23-45. The blockage
of DNA polymerization reaction by Taq DNA polymerase depends on template nucleotide content, polypurine (Pu) or
polypyrimidine (Py). D23 and R23, phage DNA-specific direct and reverse sequencing primers, respectively; D and R,
universal phage T7 promoter and M13 reverse sequencing primers, respectively; vertical bars indicate positions of MRs I
and II on the sequencing gels.

Fig. 6. Models of DNA polymer-


ization reaction and arrest at poly-
purine and polypirimidine DNA
templates. (a) DNA polymerization
is successful when a pyrimidine
strand serves as a template for Taq
DNA polymerase. (b) When purine
strand serves as a template for Taq
DNA polymerase, DNA polymeri-
zation is blocked in the middle of
MR through the formation of a
triplex in which a purine template
strand and a portion of a growing
pyrimidine strand are involved. |,
Watson–Crick hydrogen bonds; *,
reverse Hoogsteen hydrogen bonds;
→, 3′-end of growing DNA chain.
476 Siphoviruses with Triplex-forming Sequences

45). These thermophages were isolated from geo- yotic genomes,30,31,47 are known to form stable triple
graphically close but distinct regions of the Kam- helices in vitro and in vivo that block replication and
chatka peninsula.6 As suggested by similar physical transcription.32,33,37,48,49 Since the coding density of
properties and close, though not identical, restric- phage genomes is known to be very high, long non-
tion digest patterns,6 the P23-45 and P74-26 gen- coding regions may be expected to play a regulatory
omes are very similar, showing 92% overall role. We therefore hypothesize that in both phages
nucleotide identity and ∼95% common ORFs. MRs are involved in control of replication or gene
P23-45 and P74-26 nucleotide sequence data were expression. While the exact molecular mechanisms
used to check the possible presence of these phages of this control remain to be investigated, several
and their close relatives in several distinct hot water hypotheses appear to be plausible. The triplex-
sources with different temperature and pH condi- forming sequences are located between two con-
tions in the Uzon and Dolina Geyser valleys. PCR vergently transcribed genes in the orientation that
analysis with pairs of primers specific to ORF11 inhibits transcription of the downstream gene from
encoding putative DNA polymerase and ORF85 an upstream promoter (L.M., unpublished results).
(ORF84 in P74-26) encoding putative terminase Thus, MRs may attenuate transcription of down-
large subunit revealed the presence of this group stream genes either by causing transcription termina-
of phages in every water sample collected from the tion or arrest/pausing. MRs may also downregulate
Kamchatka hot springs. In contrast, similar PCR replication of phage DNA in a transcription-depen-
analysis with primers specific to several genes of dent manner due to a collision between the replication
ϕYS40 revealed the absence of this phage in the fork and an RNA polymerase stopped at the triplex-
samples (Z.B. and L.M., unpublished results). forming sequence.49 Finally, as revealed by the in vitro
The high level of similarity and the ubiquitous behavior of the MRs, they may provide a unidirec-
presence of long-tailed Thermus phages suggests that tional block for phage DNA replication, which may
viral (and by extension, bacterial) populations of hot be important for generation of unit-length copies of
springs in the Uzon and Dolina Geyser valleys may the genome. The biological role of these unusual
be experiencing genetic exchange, either through sequences and the necessity of tandem MRs for
common underground aquifers or through air. Addi- phage development are being investigated in our
tional comparative studies of microbial diversity in laboratory.
these locations should shed light on this issue. Recently, it has been revealed that triplex-forming
As in T. thermophilus phage ϕYS40,3 25% of sequences are frequently found near mutation hot-
predicted P23-45 and P74-proteins show sequence spots and at breakpoints of genetic rearrangements in
similarity to proteins from a broad phylogenetic both E. coli and mammalian cell cultures.41,42,50
range of microorganisms, including Gram-negative Moreover, several human diseases and disorders
and Gram-positive bacteria, and Archaea, as well as are found to be associated with mutations and re-
from bacteriophages infecting diverse bacteria, such arrangements promoted by triplex-forming se-
as Thermus, Bacillus, and Pseudomonas groups. quences.50–52 The results of comparative analysis
P23-45 and P74-26 encode exceptionally long of the two thermophage genomes presented here are
TMPs (ORF96 of 5002 codons and ORF95 of 5006 consistent with a similar role for these triplex-
codons, respectively), which are likely responsible forming sequences. Thus, MRs may contribute to
for the extreme tail lengths of both phages.6 Though genetic instability and genome plasticity of P23-45
phage-like particles with 2800 nm long tails have and P74-26, and may therefore have an important
been reported in ocean water),2 they have not been role in the evolution of these thermophages and
characterized. Thus, at present P23-45 and P74-26 their relatives.
have the longest tails amongst the characterized
phages. The advantages of such extremely long tails
are not clear. In addition to determining tail length, Experimental Procedures
TMPs were shown to be important for stability and
assembly of phage tails.24,44 Putative P23-45 and Bacterial growth and phage infection
P74-26 TMPs contain regions of 80–120 amino acids
with strong similarity to endopeptidases of the M23/ Bacteriophages P23-45 and P74-26 were generously
M37 metallopeptidase family. This peptidase do- provided by Dr Michael Slater, Promega Corporation
main may be part of the membrane-puncturing de- (Madison, WI). To isolate individual P23-45 or P74-26
vice.45 TMPs are thought to be partially ejected from plaques, 150 μl of freshly grown Thermus thermophilus HB8
the tail into the bacterial cytoplasm ahead of phage culture (A600∼0.4) in the TB medium (0.8% (w/v)
DNA.46 Alternatively, this region of TMP may be Tryptone, 0.4% (w/v) yeast extract, 0.3% (w/v) NaCl,
involved in proteolytic processes necessary for tail 1 mM MgCl2, 0.5 mM CaCl2) was combined with 100 μl
assembly. dilutions of phage stock, incubated for 10 min at 65 °C,
plated in soft TB agar (0.75 %), and incubated overnight at
An unusual feature of the P23-45 and P74-26
65 °C. To prepare a phage stock suspension, an individual
genomes is the presence of long polypurine-poly- plaque was picked up and subjected to two more rounds
pyrimidine (R-Y) MRs in their longest non-coding of plaque purification. To prepare the phage lysate, a
regions. Such triplex-forming sequences, extremely single plaque was resuspended in a small volume of the
rare in prokaryotes but abundant in non-coding re- TB medium and mixed with 0.1 ml of freshly grown HB8
gions, regulatory elements, and introns of eukar- culture. The mixture was incubated for 10 min at 65 °C to
Siphoviruses with Triplex-forming Sequences 477

allow phage absorption, 20 ml of fresh TB medium was phage DNA was digested with HindIII, and the resulting
added and the culture was incubated on a rotary shaker at fragments were ligated into HindIII-linearized pTZ19R
65 °C until complete lysis occurred (usually after 6–10 h). (Fermentas, Glen Burnie, MD). The recombinant clones
Cell debris was removed from the lysate by centrifugation were subjected to sequence analysis and specific primers
at 6000 g for 10 min. The resultant phage stock (∼109 pfu/ were designed on the basis of the sequencing results
ml) was saturated with chloroform and stored at 4 °C. The obtained. Several rounds of sequencing reactions were
P23-45 and P74-26 stocks were used to prepare larger performed directly on phage DNA using ThermoFidelase
volumes of phage lysates using a scale-up of the proce- and Fimer technology.53,54 Trace assembly was done with
dure described above. the phredPhrap package‡.11 The final round of sequencing
resulted in one pseudocircular contig. All the restriction
enzymes, T4 DNA ligase, and T4 polynucleotide kinase
Purification of P23-45 virions used in this work were purchased from New England
Biolabs (Ipswich, MA).
P23-45 virion purification was done as described.3
Briefly, DNase I and RNase A (Sigma-Aldrich, St. Louis,
MO, final concentration of 3 units/ml and 1 μg/ml, Molecular cloning and DNA polymerization reaction
respectively) were added to P23-45 lysate and the mixture
was incubated for 30 min at 30 °C. Solid NaCl was added A 113 bp and 257 bp DNA fragments of P23-45 genome
to a final concentration of 1 M and dissolved by swirling. comprising MRI and both MRI and MRII sequences,
The lysed culture was left on ice for 1 h and centrifuged at respectively, were amplified with primers containing
11,000g for 10 min at 4 °C. To precipitate P23-45, PEG 8000 engineered BamHI and PstI sites. The PCR fragments
was added to the supernatant to a final concentration of were digested with the appropriate restriction enzymes
10% (w/v) followed by 1–4 h incubation on ice. Preci- and ligated into BamHI-PstI-digested Bluescript SK(-)
pitated P23-45 particles were recovered by centrifugation (Stratagene, La Jolla, CA). The resultant plasmids,
at 11,000g for 10 min at 4 °C. The phage pellet was pBsktriplex1 and pBsktriplex1-2, were used as templates
resuspended in 2 ml of SM buffer (100 mM NaCl, 1 mM for sequencing reactions with 32P end-labeled T7 promoter
MgSO4, 50 mM Tris–HCl (pH 7.5), 2% (w/v) gelatin). PEG direct and M13 reverse universal sequencing primers. P23-
8000 and cell debris were extracted from the phage sus- 45 genome DNA was also used as a template for
pension by adding an equal volume of chloroform and sequencing reactions with 32P (PerkinElmer, Waltham,
centrifuged at 3000g for 15 min at 4 °C. Solid CsCl was MA) end-labeled D and R phage DNA-specific sequencing
added to the aqueous phase (at 0.5g per milliliter of phage primers. The sequencing reactions carried out with the
suspension), which contained the phage particles, and fmol DNA Cycle Sequencing Kit (Promega Corp.) and
dissolved by gentle mixing. CsCl step gradients (three with the Sequenase version 2.0 DNA sequencing kit (USB
steps of 1.45 g/l, 1.50 g/l, and 1.70 g/l density) were Corporation, Cleveland, OH) were performed according
prepared in Beckman SW41 polypropylene centrifuge to the manufacturer's procedure. The reaction products
tubes and centrifuged at 22,000 rpm for 2 h at 4 °C and at were resolved on 6% (w/v) sequencing gels and visua-
38,000 rpm for 24 h at 4 °C (Beckman SW50.1 rotor, Beck- lized using a PhosphorImager Storm840 (Molecular
man Coulter, Fullerton, CA). Purified bacteriophage sus- Dynamics, GE Healthcare, Piscataway, NJ). The sequences
pension was dialyzed twice at 4 °C overnight against a of the phage DNA-specific primers are available from the
1000-fold volume of 10 mM NaCl, 50 mM Tris–HCl (pH authors upon request.
8.0), 10 mM MgCl2. Purified virions were mixed with
loading SDS-buffer, boiled for 5 min, and loaded onto a
Tris–HCl 4%–20% (w/v) gradient polyacrylamide gel Sequence analysis
containing SDS (BioRad, Hercules, CA). The gel was
stained with Coomassie brilliant blue R-250 (BioRad). The ORFs of the P23-45 and P74-26 genomes were predicted
approximate molecular mass of each protein band was using the GeneMark server§. The PSI-BLAST program55
determined by comparison to a protein standard (Preci- was used to detect the homologs of the P23-45 and P74-26
sion Plus Protein, BioRad). genes in DNA and protein databases, with profile in-
clusion cutoff E-value in PSI-BLAST (-h parameter) set at
0.01. Both options for low-complexity filtering (-F para-
Electron microscopy meter) and composition-based statistics (-t parameter)
were sometime adjusted for better detection of sequence
CsCl density-gradient purified viruses were washed once similarities. HHpred comparison to the CDD database at
in 0.1 M ammonium acetate (pH 7.0), using a Beckman (Palo NCBI was also used to predict structural and functional
Alto, CA) J2-21 centrifuge and a JA-18.1 fixed-angle rotor protein similarities.55
operated for 60 min at 25,000g. Phage particles were tRNA genes were searched by using the tRNAscan-SE
deposited on carbon-coated copper grids, stained with 2% program.12 Searches for the presence of the transmem-
(w/v) potassium phosphotungstate (pH 7.0) or uranyl brane helices and coiled coil regions were done with the
acetate (pH 4.5) and examined in a Philips EM 300 electron aid of the SEALS package.56
microscope. Magnification was monitored using T4 phage Pairwise global alignment of two phage genomes was
tails, which measure 113 nm in the extended state.6 generated using the Stretcher program from EMBOSS
package.9,10 Alignment visualization and calculation of
percentage of identical bases in aligned genomes was
Extraction of phage DNA and genome sequencing
done using the Base-By-Base (BBB) global alignment edi-
P23-45 and P74-26 phage DNA were extracted with a
Lambda Midi Kit (Qiagen, Valencia, CA) following the
procedure recommended by the manufacturer. ‡ http://www.phrap.com
Initial sequence data were obtained using mini shotgun § http://opal.biology.gatech.edu/GeneMark/heuristic_
libraries of P23-45 and P74-26 DNA. For that, purified hmm2.cgi
478 Siphoviruses with Triplex-forming Sequences

tor.57 Orthologous genes were aligned using CLUSTAL Acknowledgements


W.58 The SeqinR package for the R statistical computing
environment∥ was used for the analysis of synonymous
and non-synonymous substitutions and RSCU biases.59 Bacteriophages P23-45 and P74-26 were gener-
Protein secondary structures were predicted using a Web ously provided by Dr Michael Slater from Promega
server NPS@ (Network Protein Sequence Analysis¶.27 Corporation. We are grateful to Dr S. Mirkin (Tufts
University, Medford, MA) for critical reading of the
MudPIT manuscript and helpful suggestions. This work was
supported by NIH grant RO1 GM59295 (to K.S.), and
P23-45 lysate was purified by double sedimentation in partially by NIH R21 grant AI074769-01 (to L.M.)
CsCl gradients and had a phage titer of 2 × 109 pfu/ml. and by National Center for Biotechnology of the
Subsequent steps were done as described.3 Briefly, the Republic of Kazakhstan grant, Russian Foundation
phage lysate was treated for 30 min at 37 °C with 0.1 U for Basic Research grant 07-04-00366 and Russian
of benzonase (Sigma-Aldrich), then precipitated in 20% Academy of Sciences Presidium Molecular and
trichloroacetic acid, 100 mM Tris–HCl (pH 8.5) overnight
at 4 °C. The dried protein pellet was denatured,
Cellular Biology program grant (to K. S.).
reduced, alkylated and digested with endoproteinase
LysC (Roche Applied Science, Indianapolis, IN) and
trypsin (Promega Corp.). The peptide mixture was split Supplementary Data
in half and each aliquot was pressure-loaded onto a
triphasic microcapillary column, installed in-line with a Supplementary data associated with this article
Quaternary Agilent 1100 series HPLC pump coupled to can be found, in the online version, at doi:10.1016/
Deca-XP ion trap tandem mass spectrometer (Thermo- j.jmb.2008.02.018
Electron, San José, CA) and analyzed via ten-step
chromatography.23
The MS/MS datasets were searched using SEQUEST60 References
against a database of 117 P23-45 predicted gene
products, combined with 2238 protein sequences from 1. Hendrix, R. W., Smith, M. C., Burns, R. N., Ford, M. E. &
T. thermophilus HB8 (including chromosome and large Hatfull, G. F. (1999). Evolutionary relationships among
plasmids), as well as usual contaminants such as human diverse bacteriophages and prophages: all the world's a
keratins, IgGs, and proteases. Additionally, to estimate phage. Proc. Natl Acad. Sci. USA, 96, 2192–2197.
background correlations, each sequence in the database 2. Wommack, K. E. & Colwell, R. R. (2000). Virioplank-
was randomized (keeping the same amino acid composi- ton: viruses in aquatic ecosystems. Microbiol. Mol. Biol.
tion and length) and the resulting “shuffled” sequences Rev. 64, 69–114.
were concatenated to the “forward” sequences and 3. Naryshkina, T., Liu, J., Florens, L., Swanson, S. K.,
searched at the same time. The total number of non- Pavlov, A. R., Pavlova, N. et al. (2006). Thermus ther-
redundant sequences searched was 5038. mophilus bacteriophage phiYS40 genome and proteo-
The DTASelect/CONTRAST program61 was used to mic characterization of virions. J. Mol. Biol. 364,
select spectra/peptide matches with normalized differ- 667–677.
ence in cross-correlation score (DeltCn) of at least 0.08, a 4. Sakaki, Y. & Oshima, T. (1975). Isolation and char-
minimum cross-correlation score (XCorr) of 1.8 for singly, acterization of a bacteriophage infectious to an ex-
2.0 for doubly, and 3.0 for triplycharged spectra, a maxi- treme thermophile, Thermus thermophilus HB8. J. Virol.
mum Sp rank of 10, and a minimal length of 7 amino 15, 1449–1453.
acids. Combining both replicate runs, proteins were 5. Sevostyanova, A., Djordjevic, M., Kuznedelov, K.,
identified by at least two peptides or one peptide with Naryshkina, T., Gelfand, M. S., Severinov, K. &
two independent spectra. Only one peptide matching Minakhin, L. (2007). Temporal regulation of viral trans-
shuffled protein sequences passed this criteria set, cription during development of Thermus thermophilus
leading to a false discovery rate of 0.09% for both bacteriophage phiYS40. J. Mol. Biol. 366, 420–435.
replicate runs. Spectral counts are considered to be a 6. Yu, M. X., Slater, M. R. & Ackermann, H. W. (2006).
good estimation of absolute protein abundance.62 To Isolation and characterization of Thermus bacterio-
account for the fact that larger proteins tend to contribute phages. Arch. Virol. 151, 663–679.
more peptide/spectra, spectral counts were divided by 7. Ackermann, H. W. (2007). 5500 Phages examined in
protein length defining a spectral abundance factor;63 these the electron microscope. Arch. Virol. 152, 227–243.
values were normalized against the sum of all values for 8. Liu, J., Glazko, G. & Mushegian, A. (2006). Protein
each run, allowing us to compare protein levels across repertoire of double-stranded DNA bacteriophages.
different runs using the NSAF value.28,29 Virus Res. 117, 68–80.
9. Myers, E. W. & Miller, W. (1988). Optimal alignments
GenBank accession codes in linear space. Comput. Appl. Biosci. 4, 11–17.
10. Rice, P., Longden, I. & Bleasby, A. (2000). EMBOSS:
The coordinates of the new sequences have been de- The European molecular biology open software suite.
posited at the GenBank with accession codes EU100883 and Trends Genet. 16, 276–277.
EU100884 for P23-45 and P74-26, respectively. 11. Besemer, J. & Borodovsky, M. (1999). Heuristic
approach to deriving models for gene finding. Nucleic
Acids Res. 27, 3911–3920.
12. Lowe, T. M. & Eddy, S. R. (1997). tRNAscan-SE: a
program for improved detection of transfer RNA genes
∥ http://pbil.univ-lyon1.fr/software/SeqinR/seqinr_ in genomic sequence. Nucleic Acids Res. 25, 955–964.
home.php 13. Sharp, P. M. & Li, W. H. (1987). The codon Adaptation
¶ http://pbil.ibcp.fr/NPSA Index–a measure of directional synonymous codon
Siphoviruses with Triplex-forming Sequences 479

usage bias, and its potential applications. Nucleic Acids 33. Kohwi, Y. & Kohwi-Shigematsu, T. (1988). Magnesium
Res. 15, 1281–1295. ion-dependent triple-helix structure formed by homo-
14. Kunisawa, T., Kanaya, S. & Kutter, E. (1998). Com- purine-homopyrimidine sequences in supercoiled plas-
parison of synonymous codon distribution patterns mid DNA. Proc. Natl Acad. Sci. USA, 85, 3781–3785.
of bacteriophage and host genomes. DNA Res. 5, 34. Baran, N., Lapidot, A. & Manor, H. (1991). Formation
319–326. of DNA triplexes accounts for arrests of DNA syn-
15. Sau, K., Gupta, S. K., Sau, S. & Ghosh, T. C. (2005). thesis at d(TC)n and d(GA)n tracts. Proc. Natl Acad.
Synonymous codon usage bias in 16 Staphylococcus Sci. USA, 88, 507–511.
aureus phages: implication in phage therapy. Virus Res. 35. Dayn, A., Samadashwily, G. M. & Mirkin, S. M. (1992).
113, 123–131. Intramolecular DNA triplexes: unusual sequence
16. Lawrence, J. G. (2001). Catalyzing bacterial speciation: requirements and influence on DNA polymerization.
correlating lateral transfer with genetic headroom. Proc. Natl Acad. Sci. USA, 89, 11406–11410.
Syst. Biol. 50, 479–496. 36. Samadashwily, G. M., Dayn, A. & Mirkin, S. M. (1993).
17. Ravin, V., Ravin, N., Casjens, S., Ford, M. E., Hatfull, Suicidal nucleotide sequences for DNA polymeriza-
G. F. & Hendrix, R. W. (2000). Genomic sequence and tion. EMBO J. 12, 4975–4983.
analysis of the atypical temperate bacteriophage N15. 37. Krasilnikov, A. S., Panyutin, I. G., Samadashwily,
J. Mol. Biol. 299, 53–73. G. M., Cox, R., Lazurkin, Y. S. & Mirkin, S. M. (1997).
18. Yuzenkova, J., Nechaev, S., Berlin, J., Rogulja, D., Mechanisms of triplex-caused polymerization arrest.
Kuznedelov, K., Inman, R. et al. (2003). Genome of Nucleic Acids Res. 25, 1339–1346.
Xanthomonas oryzae bacteriophage Xp10: an odd T-odd 38. Ohshima, K., Montermini, L., Wells, R. D. & Pandolfo,
phage. J. Mol. Biol. 330, 735–748. M. (1998). Inhibitory effects of expanded GAA.TTC
19. Pedulla, M. L., Ford, M. E., Houtz, J. M., Karthikeyan, triplet repeats from intron I of the Friedreich ataxia
T., Wadsworth, C., Lewis, J. A. et al. (2003). Origins of gene on transcription and replication in vivo. J. Biol.
highly mosaic mycobacteriophage genomes. Cell, 113, Chem. 273, 14588–14595.
171–182. 39. Krasilnikova, M. M. & Mirkin, S. M. (2004). Replica-
20. Susskind, M. M. & Botstein, D. (1978). Molecular ge- tion stalling at Friedreich's ataxia (GAA)n repeats in
netics of bacteriophage P22. Microbiol. Rev. 42, 385–413. vivo. Mol. Cell Biol. 24, 2286–2295.
21. Huber, K. E. & Waldor, M. K. (2002). Filamentous phage 40. Pollard, L. M., Sharma, R., Gomez, M., Shah, S.,
integration requires the host recombinases XerC and Delatycki, M. B., Pianese, L. et al. (2004). Replication-
XerD. Nature, 417, 656–659. mediated instability of the GAA triplet repeat muta-
22. Washburn, M. P., Wolters, D. & Yates, J. R., 3rd (2001). tion in Friedreich ataxia. Nucleic Acids Res. 32,
Large-scale analysis of the yeast proteome by multi- 5962–5971.
dimensional protein identification technology. Nature 41. Faruqi, A. F., Datta, H. J., Carroll, D., Seidman, M. M.
Biotechnol. 19, 242–247. & Glazer, P. M. (2000). Triple-helix formation induces
23. Florens, L. & Washburn, M. P. (2006). Proteomic ana- recombination in mammalian cells via a nucleotide
lysis by multidimensional protein identification tech- excision repair-dependent pathway. Mol. Cell Biol. 20,
nology. Methods Mol. Biol. 328, 159–175. 990–1000.
24. Pedersen, M., Ostergaard, S., Bresciani, J. & Vogensen, 42. Vasquez, K. M., Narayanan, L. & Glazer, P. M. (2000).
F. K. (2000). Mutational analysis of two structural Specific mutations induced by triplex-forming oligo-
genes of the temperate lactococcal bacteriophage nucleotides in mice. Science, 290, 530–533.
TP901-1 involved in tail length determination and 43. Wang, G. & Vasquez, K. M. (2006). Non-B DNA struc-
baseplate assembly. Virology, 276, 315–328. ture-induced genetic instability. Mutat. Res. 598,
25. Katsura, I. & Hendrix, R. W. (1984). Length determi- 103–119.
nation in bacteriophage lambda tails. Cell, 39, 691–698. 44. Abuladze, N. K., Gingery, M., Tsai, J. & Eiserling, F. A.
26. Katsura, I. (1987). Determination of bacteriophage (1994). Tail length determination in bacteriophage T4.
lambda tail length by a protein ruler. Nature, 327, Virology, 199, 301–310.
73–75. 45. Piuri, M. & Hatfull, G. F. (2006). A peptidoglycan
27. Combet, C., Blanchet, C., Geourjon, C. & Deleage, G. hydrolase motif within the mycobacteriophage TM4
(2000). NPS@: network protein sequence analysis. tape measure protein promotes efficient infection of
Trends Biochem. Sci. 25, 147–150. stationary phase cells. Mol. Microbiol. 62, 1569–1585.
28. Paoletti, A. C., Parmely, T. J., Tomomori-Sato, C., Sato, 46. Roessner, C. A. & Ihler, G. M. (1984). Proteinase sen-
S., Zhu, D., Conaway, R. C. et al. (2006). Quantitative sitivity of bacteriophage lambda tail proteins gpJ and
proteomic analysis of distinct mammalian Mediator pH in complexes with the lambda receptor. J. Bacteriol.
complexes using normalized spectral abundance fac- 157, 165–170.
tors. Proc. Natl Acad. Sci. USA, 103, 18928–18933. 47. Bacolla, A., Collins, J. R., Gold, B., Chuzhanova, N., Yi,
29. Zybailov, B., Mosley, A. L., Sardiu, M. E., Coleman, M., Stephens, R. M. et al. (2006). Long homopurine*-
M. K., Florens, L. & Washburn, M. P. (2006). Statistical homopyrimidine sequences are characteristic of genes
analysis of membrane proteome expression changes in expressed in brain and the pseudoautosomal region.
Saccharomyces cerevisiae. J. Proteome Res. 5, 2339–2347. Nucleic Acids Res. 34, 2663–2675.
30. Schroth, G. P. & Ho, P. S. (1995). Occurrence of 48. Grabczyk, E. & Fishman, M. C. (1995). A long purine-
potential cruciform and H-DNA forming sequences in pyrimidine homopolymer acts as a transcriptional
genomic DNA. Nucleic Acids Res. 23, 1977–1983. diode. J. Biol. Chem. 270, 1791–1797.
31. Cox, R. & Mirkin, S. M. (1997). Characteristic en- 49. Krasilnikova, M. M., Samadashwily, G. M., Krasilnikov,
richment of DNA repeats in different genomes. Proc. A. S. & Mirkin, S. M. (1998). Transcription through
Natl Acad. Sci. USA, 94, 5237–5242. a simple DNA repeat blocks replication elongation.
32. Mirkin, S. M., Lyamichev, V. I., Drushlyak, K. N., EMBO J. 17, 5095–5102.
Dobrynin, V. N., Filippov, S. A. & Frank-Kamenetskii, 50. Bacolla, A., Jaworski, A., Larson, J. E., Jakupciak,
M. D. (1987). DNA H form requires a homopurine- J. P., Chuzhanova, N., Abeysinghe, S. S. et al. (2004).
homopyrimidine mirror repeat. Nature, 330, 495–497. Breakpoints of gross deletions coincide with non-B
480 Siphoviruses with Triplex-forming Sequences

DNA conformations. Proc. Natl Acad. Sci. USA, 101, analysis of whole viral genome alignments. BMC
14162–14167. Bioinformatics, 5, 96.
51. Campuzano, V., Montermini, L., Molto, M. D., Pianese, 58. Thompson, J. D., Higgins, D. G. & Gibson, T. J. (1994).
L., Cossee, M., Cavalcanti, F. et al. (1996). Friedreich's CLUSTAL W: improving the sensitivity of progres-
ataxia: autosomal recessive disease caused by an intronic sive multiple sequence alignment through sequence
GAA triplet repeat expansion. Science, 271, 1423–1427. weighting, position-specific gap penalties and weight
52. Mirkin, S. M. (2006). DNA structures, repeat expan- matrix choice. Nucleic Acids Res. 22, 4673–4680.
sions and human hereditary disorders. Curr. Opin. 59. Ihaka, R. & Gentleman, R. (1996). R: a language for
Struct. Biol. 16, 351–358. data analysis and graphics. J. Comput. Graph. Stat. 5,
53. Polushin, N., Malykh, A., Morocho, A. M., Slesarev, A. 299–314.
& Kozyavkin, S. (2005). High-throughput production 60. Eng, J., McCormack, A. L. & Yates, J. R., 3rd (1994). An
of optimized primers (fimers) for whole-genome approach to correlate tandem mass spectral data of
direct sequencing. Methods Mol. Biol. 288, 291–304. peptides with amino acid sequences in a protein
54. Ewing, B., Hillier, L., Wendl, M. C. & Green, P. database. J. Am. Mass Spectrom. 5, 976–989.
(1998). Base-calling of automated sequencer traces 61. Tabb, D. L., McDonald, W. H. & Yates, J. R., 3rd (2002).
using phred. I. Accuracy assessment. Genome Res. 8, DTASelect and Contrast: tools for assembling and
175–185. comparing protein identifications from shotgun pro-
55. Soding, J., Biegert, A. & Lupas, A. N. (2005). The teomics. J. Proteome Res. 1, 21–26.
HHpred interactive server for protein homology de- 62. Liu, H., Sadygov, R. G. & Yates, J. R., 3rd (2004). A
tection and structure prediction. Nucleic Acids Res. 33, model for random sampling and estimation of relative
W244–W248. protein abundance in shotgun proteomics. Anal.
56. Walker, D. R. & Koonin, E. V. (1997). SEALS: a system Chem. 76, 4193–4201.
for easy analysis of lots of sequences. Proc. Int. Conf. 63. Powell, D. W., Weaver, C. M., Jennings, J. L., McAfee,
Intell. Syst. Mol. Biol. 5, 333–539. K. J., He, Y., Weil, P. A. & Link, A. J. (2004). Cluster
57. Brodie, R., Smith, A. J., Roper, R. L., Tcherepanov, V. & analysis of mass spectrometry data reveals a novel
Upton, C. (2004). Base-By-Base: single nucleotide-level component of SAGA. Mol. Cell Biol. 24, 7249–7259.

You might also like