Thymine in E Coli Coding Frames

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

International Journal of Advanced Bioinformatics

Research Article Open access

Comparison and analysis of XTX in coding frames of DNA sequences of


Escherichia coli k12 genome
C.Paruvathavarthini1*, H.Karthick1 and E.Rajasekaran2

1
Department of Bioinformatics, PRIST University, Thanjavur – 614 904
2
Department of Biotechnology, Periyar Maniyammai University, Thanjavur – 613 403

*
Corresponding author e-mail: ccmcvarthini@yahoo.com ;

Published: 00 ---- 2009 Received: 16 June 2009


Accepted: 04 August 2009

ABSTRACT

Computer based genome-wide screening of the DNA sequences of Escherichia coli strain K12 revealed that the importance of XTX
(where X = A, T, G or C) which codes for large hydrophobic groups in amino acids. We have analyzed nucleotide sequences of
Escherichia coli k12 and find that fraction of XTX in all six frames. Studies have demonstrated that frame 1, 3 and 5 (nucleotide)
has large amount of XTX compared to the frames 2, 4 and 6. We therefore performed an investigation and proved that the DNA
sequences have 30% of XTX in coding frames.

Keywords: frame analysis; hydrophobic residues.

INTRODUCTION genetic analyses and through bioinformatics studies are


The increasing genome sequence data of microorganisms has continuing, and microarray technology is shedding additional
provided the basis for comprehensive understanding of light on the functions associated with the gene products of
organisms at the molecular level. Besides sequence data, a the organism in question. The wealth of biological
large number of experimental and computational resources information on E. coli is still increasing [4] and is
are required for genome-scale analyses. Escherichia coli K- contributing to a better understanding of this organism as
12 has been one of the best characterized organisms in well as of functions encoded in other organisms. The E. coli
molecular biology [1]. The field of genomics has been K-12 chromosome is currently represented by 4,401 genes
expanding at a rapid pace since the annotated Escherichia encoding 116 RNAs and 4,285 proteins [5].
coli K-12 genome was published in 1997 [2], with the current
number of published genomes exceeding 66 and with another The aim of this study based on the analysis of the distribution
364 on their way according to the Genomes OnLine Database of nucleotide sequences of Escherichia coli K12. The coding
(GOLD) [3]. Deciphering the functions encoded by all gene sequences (CDSs) are represented by modules that are
products of the genomes is the next big challenge in the field. protein elements of at least 100 amino acids with biological
Function attributions through experimental, biochemical and activity and independent evolutionary history [6]. The

Paruvathavarthini et al., International Journal Of Advanced Bioinformatics 0(0):00-00, (2009)


complete set of nucleotide sequences was collected from the less fraction. The maximum percentage in frame 1, 2, 3, 4, 5
FTP site present in the NCBI (National Center for and 6 is 9% of 0.29, 9% of 0.26, 5% of 0.29, 11% of 0.21,
Biotechnology Information) [7]. The sequences were saved 9% of 0.28 and 9% of 0.21 fraction of XTX (Figure 1).
in the FASTA format. Then the sequences were analyzed by
means of home made C programs and the results saved in MS The frame 1 and 2 is proportionally decreasing and the
Excel. In the worksheet, calculation of average, probable and average value of frame 4 is high when compared to frame 5
fraction of Thymine (T) was calculated and using those and frame 6. But frame 3 is very less average of XTX value
values graph was done. when compared to all frames (Figure 2).

The frame 2, frame 3, frame 5 and frame 6 is similar when


METHODOLOGY compared to frame 1 and frame 4. But frame 4 is very high
Only the potential protein coding regions of genome probable compared to all frames (Figure 3). It is
sequences (FASTA format) of the Escherichia coli k12 proportionally decreasing in frame 1, 2 and 3, but suddenly
analyzed in this study were downloaded from the National increasing value in frame 4 and again decreased in frame 5
Centre for Biotechnology Information (NCBI) [7]. and frame 6.
Trinucleotide analysis of all the nucleotide sequences of an
organism was performed using the ‘C’ program. This The usage of codons and nucleotide combinations varies
program finds out the fraction of XTX in nucleotide along genes and systematic variation causes gradients in
sequences of Escherichia coli k12 in all six frames. usage. For triplets, changes in codon preference may occur
Frequency of XTX occurrences, average and probable values due to speed or accuracy. We find no relationship between
were calculated and the graph plotted using MS-Excel. All major/minor status of a codon and its gradient of usage. In a
those values are compared and discussed. comparison between values for translation speeds and
gradients of usage, 6 out of 9 codons with positive gradients
had a faster speed of translation than their synonyms. Due to
RESULTS AND DISCUSSION
the limited translation speed data, no significance tests can be
Codon usage was defined as the number of times
made [9].
(frequency) a codon is translated per unit time in the cell of
an organism. This is a definition for real-time codon usage.
Genes can be classified as essential or nonessential based on
But it is hard to be measured in vivo. Three different methods
to estimate the codon usage in E. coli and other organisms their indispensability for a living organism. . Analysis of
in their studies including measuring the average frequencies evolutionary conservation, protein length distribution and
amino acid usage between essential and nonessential genes in
of codons in the sequenced protein coding genes in an
Escherichia coli K12 demonstrated that essential genes are
organism. All their methods gave approximately the same
relatively preserved throughout the bacterial kingdom when
results as regards the hierarchy for “most used’ and “least
compared to nonessential genes. Furthermore, results show
used” codons within each synonymous codon family.
Therefore, it is reasonable to use averaged codon frequency that essential genes, compared to nonessential genes, have a
of the sequenced protein-coding reading frames of an significantly higher proportion of large (>534 amino acids)
and small proteins (<139 amino acids) relative to medium-
organism to roughly represent the real-time codon usage
sized proteins. The pattern of amino acids usage shows a
although this may over-estimate the usages of infrequently or
similar trend for essential and nonessential genes, although
rarely used codons and underestimate those of frequently
some notable exceptions are observed [10]. These findings
used codons because different reading frames are used and
translated for different number of times in the organism at a help to clarify our understanding of the evolutionary
given time [8]. Besides, this is what codon usage generally mechanisms of essential and nonessential genes, relevant to
the study of mutagenesis and possibly allowing prediction of
means too many scientists in the past and at present.
gene properties in other poorly understood organisms. The
The thymine in the second position of the codons is final conclusion is that the amount of thymine is reduced
considered as frame1. Nucleotides 1 and 3 of the codons are during evolution that leads to production of diseased proteins
considered as frames 3 and 2 respectively. The that unable to function normally [11].
complementary nucleotide of 1, 2 and 3 nucleotides of
codons are considered as frame 6, 4 and 5, respectively [11].
The frame 1, frame 2, frame 5 and frame 6 are looks like
normal distribution and the frame 3 is towards left and frame CONCLUSION
4 is towards right when compared to other frames. The frame The data presented in this paper strongly suggests that
1 is having very high fraction of XTX and the frame 4 is very machine learning theories, especially supervised methods,

Paruvathavarthini et al., International Journal Of Advanced Bioinformatics 0(0):00-00, (2009)


could provide the best initial approaches to characterizing and Blattner FR, Plunkett G, Bloch CA, Perna NT, Burland V,
assigning gene function in functional genomics. The overall Riley M, Collado-Vides J, Glasner JD, Rode CK and
observation made and suggests that the thymine in DNA
Mayhew GF (1997). The complete genome sequence of
plays an important role in producing stable and active
proteins. Particularly, the fraction of thymine in all 6 frames Escherichia coli K-12, Science, 277:1453-1474.
of the nucleotide sequences is analyzed. In this study the
fraction of thymine in all 6 frames of the DNA sequences and GOLD: Genomes OnLine Database homepage
distribution of DNA sequences based on the fraction of http://igweb.integratedgenomics.com/GOLD/
thymine is carried out on E.coli K12., It is observed that the
variation in the fraction of thymine in frames 1,2,5 & 6 are Riley M and Serres MH (2000). Interim report on genomics
similar compared to frame 3 and 4. The E.coli K12 show a of Escherichia coli,Annu Rev Microbiol, 54:341-411.
higher amount of sequences having the probable amount of
thymine in frame 4 is very high. The probable and average Margrethe H Serres, Shuba Gopal, Laila A Nahum, Ping
fraction of thymine in frame 4 is higher and frame 3 is lesser Liang, Terry Gaasterland and Monica Riley (2001). A
compared to other frames. From the overall observation functional update of the Escherichia coli K-12 genome,
frame 3 is following the normal trend why because it is a Genome Biology, 2: research0035.1-0035.7doi:10.1186/gb-
coding sequence. This study will help you to understand the 2001-2-9-research0035.
coding and non-coding DNA sequences. And also it leads to
play a major role in disease control. From the overall study http://genomebiology.com/2001/2/9/research/0035
we found that the coding sequence must have the standard
ratio of 30% amount of XTX. It is observed in all cases that ftp://ftp.ncbi.nlm.nih.gov/genomes
frame3 is having least amount of thymine.
Zhang SP, Zubay G and Goldman E (1991). Low-usage
REFERENCES codons in Escherichia coli, yeast, fruit fly and primates,
Tomoya Baba, Hsuan-Cheng Huan, Kirill Datsenko and Gene, 105: 61-72.
Barry L. Wanner Hirotada Mori (2007). The Applications of
Systematic In-Frame, Single-Gene Knockout Mutant Sean D. Hooper and Otto G. Berg (2000). Gradients in
Collection of Escherichia coli K-12, Microbial Gene nucleotide and codon usage along Escherichia coli genes,
Essentiality: Protocols and Bioinformatics, 416:183-194. Nucleic Acids Research, 28:No. 18, 3517-3523.

Xiaodong Gong, Shaohua Fan, Amy Bilderbeck,


Mingkun Li, Hongxia Pang and Shiheng Tao (2007).
Comparative analysis of essential genes and nonessential
genes in Escherichia coli K12, Molecular Genetics and
Genomics, 279: No. 1, 87-94.

Perumal Anandagopu, Sivakumar Suhanya, Veerasamy


Jayaraj and Ekambaram Rajasekaran (2008). Role of thymine
in protein coding frames of mRNA sequences,
Bioinformation, 2(7): 304-307.

Paruvathavarthini et al., International Journal Of Advanced Bioinformatics 0(0):00-00, (2009)


0.35

0.3
FRACTION OF XTX

F1
0.25
F2
0.2 F3
0.15 F4
F5
0.1
F6
0.05

0
0 5 10 15 20 25
NO. OF. SEQUENCES XTX

Figure 1: Distribution of XTX in six frames

AVERAGE VALUE OF XTX IN ALL FRAMES

0.25

0.2
AVERAGE

0.15

0.1

0.05

0
0 1 2 3 4 5 6 7
FRAMES

Figure 2: This figure shows the average value of XTX in six frames

Paruvathavarthini et al., International Journal Of Advanced Bioinformatics 0(0):00-00, (2009)


PROBABLE VALUE OF XTX IN ALL FRAMES

0.45
0.4
0.35
PROBABLE

0.3
0.25
0.2
0.15
0.1
0.05
0
0 1 2 3 4 5 6 7
FRAMES

Figure 3: This figure shows the probable value of XTX in six frames

© 2009 Paruvathavarthini et al ; licensee mmpublication. This is an open access article distributed under the terms of the creative Common
Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and sources are
credited.

Paruvathavarthini et al., International Journal Of Advanced Bioinformatics 0(0):00-00, (2009)

You might also like