Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Signal Processing Techniques for the analysis of

Human Genome associated with cancer cells


S.Barman(Mandal)a S.Sahab A.Mandalc M.Royd

a,b,c, Institute of Radiophysics and Electronics, University of Calcutta, 92, A.P.C. Road, Kolkata-700009
d, Principal, The Calcutta Technical School, Kolkata

barman_s@email.coma , sumansaha.in@gmail.comb, atin.me@gmail.comc, dipamani_ccp@rediffmail.comd

that must be obtained from the diet are called “Essential


Abstract— Digital signal processing methods are becoming Amino Acids”; other Amino Acids that the body can
increasingly important in gene identification, biological sequence manufacture from the other sources are called “Nonessential
analysis and disease diagnosis. In DNA, there are 20 amino acids Amino Acid”. The basic requirement to impair the
which are the basic building blocks that make up protein.
development of cancer cells is controlling these Amino Acids.
According to medical researchers some amino acids are highly
related to Cancer diseases. In the present paper the authors have CAAT protocol is used in cancer treatment to reduce
shown a comparative spectral analysis of amino acid chain for certain amino acids in the daily diet of cancer patient [2].
various cancer and non cancer cell using Digital Signal
Processing. Dr. Otto [2] discovered that all cancer cells produce
inordinate amount of lactic acid. Dr. Lee etal [2] showed the
Keywords- Genomics, Amino acids, Protein, DNA, Power deficiency of glucose can commit suicide of cancer cells.
spectral Density Argenine can modulate the growth of the Breast Cancer [6].
I. INTRODUCTION George C , etal stated that Aspartic Acid (D), Glutamic Acid
(E), Glycine (G), Serine (S), Alanine (A) and Cysteine (C)
In the recent years researchers from various cross fields generated through the synthesis of Glucose[5].
have concentrated their research in the field of biology.
Research related to biology has entered the field of Considering all the above and the literatures surveyed it
information extraction and data analysis. Two major research has been understood that the Amino Acids are highly related
area of genomics are DNA sequence analysis and disease to cancer. The authors have presented a comparative study of
diagnosis. DNA sequence analysis technology has been power spectral density of various amino acids in the present
developing fast for decades when attempt has been made to article.
reveal some hidden features present in protein coding regions III. OVERVIEW OF DNA
of DNA. Where as diagnosis of disease is generally used to
find out any abnormality present in DNA because almost all Genomic Signal Processing is primarily the processing of
human genetic diseases such as cancer and development of DNA sequence, RNA sequence and proteins[1]. DNA
abnormalities are characterized by the presence of genetic (Deoxyribo Nucleic Acid) is double helix structure, containing
variation. It is understood that 20 types of amino acids in DNA the genetic information of living organisms. DNA is made up
play an important role in the study of cancer disease. of a number of units called nuclotides consisting of Pentode
Controlled Amino Acid Therapy (CAAT) is an efficient sugar, phosphoric acid and Nitrogen bases such as
medical treatment used to impair the development of cancer Adenine(A), Guanine (G), Cytosine (C) and Thymine (T) [1,3].
cell. The authors have studied the statistics of the amino acids The nuclotide base A is always linked with T and C always
for cancer and non-cancer cells from various human genomes linked with G by hydrogen bonding. The RNA (Ribo Nucleic
and have applied digital signal processing techniques for Acid) molecule is closely related to the DNA. It is made up of
spectral analysis of these amino acids chain. This paper is four bases but a molecule called Uracil (U) is present instead
structured as follows: Introduction, Prior work, Overview of of Thymine and U always pair with A.
DNA, PSD analysis, Discussion, Results and Conclusion. A DNA sequence can be separated into two regions, genes
and intergenic spaces. Genes contain the information for
II. PRIOR WORK
generation of proteins. Each gene is responsible for the
In DNA there are 20 amino acids. Amino acids are the production of a different protein. Genes has two types of sub-
chemical units or building blocks of the body that make up regions called exons and introns. The gene is first copied into
proteins. Protein substances make up the muscles, tendons, a single stranded chain called the messenger RNA or mRNA
organs, glands, nails and hair. Growth, repair and maintenance molecule. The introns are then removed from the mRNA by a
of all cells are dependent upon them. Next to water, protein process called splicing. The spliced mRNA is divided into
makes up the greatest part of our body weight. Amino acids groups of three adjacent bases. Each triplet is called a codon.

IEMCON 2011 ORGANISED IN COLLABORATION WITH IEEE,INDIA 570


There are 64 possible codons. Each codon instructs the cell IV. POWER SPECTRAL DENSITY OF AMINO ACIDS
machinery to synthesize an amino acid. The codon sequence The application of Fourier transform techniques has been
therefore uniquely identifies an amino acid sequence which found to be very useful both for DNAs and protein sequences.
defines a protein. This mapping is called the genetic code and DSP techniques is applicable only to numerical data but the
is shown in Figure 1. amino acids consist of alphabet A, C, D, E, F, G, H, I, K, L,
M, N, P, Q, R, S, T, V, W, Y. Hence a mapping technique is
Codon for
Glu (E)
required to convert the alphabetic sequence into numerical
sequence before applying DSP. Here authors have attempted
Start Codon Stop binary mapping techniques based on presence or absence of
Codon
ATG GAA GTG GCA ATG ATC CTG AAT TTA TAG Gene
amino acids in a protein coding region of DNA sequence.
I
For example a amino acid chain of a DNA sequence of
E V A M I L N L Protein length N
5‟ End 3‟ End x[n] = M P I G S K E R P T F F E I F K T R C N K A

Figure 1. An example showing mapping of codon to Amino Acid[3] After mapping


There are 64 possible codons but only 20 amino acids x(A)= 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
have, the mapping is many to one. The chemical bond between x(C)= 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
the amino acids is covalent peptide bond. The list of 20 amino
acid is shown in Table 1. .
TABLE I. LIST OF TWENTY AMINO ACIDS AND CODONS [3]
.
1 A Ala Alanine GCA,GCC,GCG,GCT
x(W)= 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 C Cys Cystein (has S) TGC,TGT
3 D Asp Aspartic Acid GAC,GAT x(Y)= 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 E Glu Glutamic Acid GAA,GAG
5 F Phe Phenylalanine TTC,TTT The DFT
6 G Gly Glycine GGA,GGC.GGG,GGT
7 H His Histidine CAC,CAT Xs[k] = xs[n]e-j2nk/n
8 I Ile Isoleucine ATA,ATC,ATT
Where
9 K Lys Lysine AAA,AAG
10 L Leu Leucine TTA,TTG,CTA,CTC,CTG,T k= 0,1,2,…..N-1
11 M Met Methionine ATG
12 N Asn Asparagine AAC,AAT n= 0,1,2…...N-1
13 P Pro Proline CCA,CCC,CCG,CCT
s= A,C,D,E,F,G,H,I,K,L,M,N,P,Q,R,S,T,V,W,Y
14 Q Gln Glutamine CAA,CAG
15 R Arg Arginine AGA,AGG,CGA,CGC,GG,GT
16 S Ser Serine AGC,AGT,TCA,TCC,TCG,TCT
17 T Thr Threonine ACA,ACC,ACG,ACT Then the power spectral content of the sequence is given
18 V Val Valine GTA,GTC,GTG,GTT by
19 W Trp Tryptophan TGG
20 Y Tyr Tyrosine TAC,TAT Ps[k] =  Xs[k]2
k
Amino acids are the „building blocks‟ of the body out
of which eight are essential, rest are non-essential. The plot of Ps[k] of coding region is investigated as
Besides building cells and repairing tissue, amino acids indicator of cancer cell or non-cancer cells. The Bar plot of
form antibodies to combat invading bacteria and viruses; they PSD of various cancer and non-cancer cell have been depicted
are part of the enzyme and hormonal system; they carry in figure 2.
oxygen throughout the body and are part of all muscular V. ANALYSIS
activity.
The authors calculated and plotted the PSD of twenty
According to medical report, some amino acids are number of amino acids of various cancer and non-cancer cells
essential for the growth of tumor cells and restricting them or of Homo Sapiens Genes as shown in table 2.
inhibiting them may be beneficial for curing cancer patients.
Bar plots of Amino acids show that the PSD of some
Amino acids Asparagine (N) is required by lumphom amino acids are very high in case of cancer cells compared to
tumor cells for them to grow. non-cancer cells such as S, I, R, E, N [Fig.2].
Restricting amino acids Tyrosine (Y) and Phenylalanine Where as amino acids V, G, A give low values of PSD for
(F) inhibit growth of melanoma tumor cells. The authors in cancer cells.
this article present a preliminary study of spectral content
distribution of amino acids for various cancer and non-cancer According to various medical research reports it has been
cells of Homo Sapiens Genes. understood that restriction of few amino acids may be

IEMCON 2011 ORGANISED IN COLLABORATION WITH IEEE,INDIA 571


beneficial for inhibiting growth of cancer cells. The same has 3 AF083883 Homo sapiens mutant beta-globin
been confirmed by using DSP technique. (HBB) gene.
TABLE II. CANCER AND NON-CANCER CELLS OF HOMO SAPIENS GENES 4 AF186607 Homo sapiens haplotype A11a beta-
CANCER globin (HBB) gene.
5 AF186613 Homo sapiens haplotype B18 beta-
SL Accession Remark globin (HBB) gene.
No No.
6 NM_012400 Homo sapiens phospholipase A2,
1 AF309413 Homo sapiens BRCA2 protein gene. group IID (PLA2G2D), mRNA.
2 AF348515 Homo sapiens mutant early onset VI. CONCLUSION
breast cancer (BRCA2) gene.
In this paper authors have used DSP techniques to measure
3 AF507076 Homo sapiens IRCHS11B breast and
the spectral contents of 20 amino acids. Bar plots of various
ovarian cancer susceptibility protein amino acids showed. For some amino acids such as Serine,
(BRCA1) gene. Isoleucine, Arginine, Glutamic Acid and Asparagine PSD
4 AF507088 Homo sapiens IRCHTP9B breast and values are high in cancer cells compared to non-cancer cells.
ovarian cancer susceptibility protein
(BRCA2) gene. According to literature survey it has been learnt that
5 AF008216 Homo sapiens candidate tumor Arginine can modulate the growth of breast cancer and
suppressor pp32r1 (PP32R1) gene. deficiency of Glucose can commit suicide of cancer cells.
6 AF489725 Homo sapiens isolate IRCHF10A Glutamic Acids, Serine are generated through glucose
breast and ovarian cancer synthesis. High PSD value of Serine and Glutamic acids can
susceptibility protein (BRCA2) gene. be treated as indicator of cancer diseases. From the average
7 AF489726 Homo sapiens isolate IRCHF10B PSD plots shown in figure 3, it has been observed that Serine
breast and ovarian cancer (S) and Arginine (R) acids show a very high PSD value for
susceptibility protein (BRCA2) gene. cancer cells compare to non-cancer cells. Therefore high PSD
8 AF489737 Homo sapiens isolate IRCHS13A value of Serine (S) and Arginine (R) acids are good indicator
breast and ovarian cancer of cancer disease.
susceptibility protein (BRCA2) gene. Comparison of Cancer and Non-Cancer PSD Average Value
9 AL137247 Human DNA sequence of the
0.016
BRCA2 gene for early onset breast 0.014
cancer. 0.012
0.01
10 AY144588 Homo sapiens truncated breast and 0.008
Cancer
Noncancer
ovarian cancer susceptibility protein 0.006
0.004
(BRCA1) gene. 0.002
11 DQ075361 Homo sapiens truncated breast and 0
A C D E F G H I K L M N P Q R S T V W Y
ovarian cancer susceptibility protein
1 (BRCA1) gene. Figure 3. Comparison Of Average PSD Value between Cancer and Non-
12 NM_000059 Homo sapiens breast cancer 2, early Cancer Cells
onset (BRCA2).
13 NM_000465 Homo sapiens BRCA1 associated REFERENCES
RING domain 1 (BARD1) mRNA.
14 NM_004656 Homo sapiens BRCA1 associated [1] Dimitris Anastassiou, “Genomic Signal Processing”, IEEE Signal
protein-1. Processing Magazine, pp. 8-20, July 2001.
15 NM_007525 Mus musculus BRCA1 associated [2] A.P.John Institute for Cancer Research paper on Controlled Amino Acid
Therapy (CAAT) works.
RING domain 1 (Bard1), mRNA [3] P. P. Vaidyanathan, “Genomics and proteomics: A Signal Processor‟s
16 NM_016567 Homo sapiens BRCA2. Tour”, IEEE Circuits and Systems Magazine, March 2005.
[4] Daniele Bani, Emanuela Masini, Maria Grazia Bello, Mario Bigazzi, and
17 NM_024675 Homo sapiens partner and localizer Tatiana Bani Sacchi, “Relaxin Activates the L-Arginine-Nitric Oxide
of BRCA2 (PALB2). Pathway in Human Breast Cancer Cells”, CANCER RESEARCH 55,
18 NM_032043 Homo sapiens BRCA1 interacting 5272-5275, November 15, 1995.
protein. [5] George C. Rock and Kendall W. King, “Amino Acid Synthesis from
Glucose-U-14C in Argyrotaenia velutinana (Lepidoptera: Tortricidae)
19 NM_078468 Homo sapiens BRCA2 and Larvae”, The Journal of Nutrition, 95,369-373.
CDKN1A interacting protein. [6] Rajan Singh, Shehla Pervin, Ardeshir Karimi, Stephen Cederbaum, and
NON-CANCER Gautam Chaudhuri, “Arginase Activity in Human Breast Cancer Cell
Lines: Nv-Hydroxy-L-arginine Selectively Inhibits Cell Proliferation
1 AF007546 Homo sapiens beta-globin gene. and Induces Apoptosis in MDA-MB-468 Cells”, CANCER RESEARCH
60, 3305-3312, June 15, 2000.
2 AF008216 Homo sapiens candidate tumor [7] National Centre for Biotechnology Information (NCBI). [Online].
suppressor. Available: http://www.ncbi.nlm.nih.gov/.

IEMCON 2011 ORGANISED IN COLLABORATION WITH IEEE,INDIA 572


CANCER NM_032043 (PSD)
AY144588 (PSD)
0.01

0.09 0.009
0.008
0.08
0.007
0.07
0.006
0.06
0.005
0.05 0.004
0.04 0.003
0.03 0.002
0.02 0.001
0.01 0
A C D E F G H I K L M N P Q R S T V W Y
0
A C D E F G H I K L M N P Q R S T V W Y
NON-CANCER
AF507076 (PSD) AF083883 (PSD)

0.035 0.016
0.014
0.03
0.012
0.025
0.01
0.02 0.008

0.015 0.006
0.004
0.01
0.002
0.005
0
0 A C D E F G H I K L M N P Q R S T V W Y
A C D E F G H I K L M N P Q R S T V W Y
AF007546 (PSD)
AF507076 (PSD)
0.016
0.035 0.014

0.03 0.012
0.01
0.025
0.008
0.02
0.006
0.015
0.004
0.01
0.002
0.005 0
0 A C D E F G H I K L M N P Q R S T V W Y
A C D E F G H I K L M N P Q R S T V W Y
NM_012400 (PSD)
AF348515 (PSD)
0.012
0.025
0.01

0.02 0.008

0.015 0.006

0.004
0.01
0.002
0.005
0
0 A C D E F G H I K L M N P Q R S T V W Y
A C D E F G H I K L M N P Q R S T V W Y
AF186613 (PSD)

NM_078468 (PSD) 0.025

0.016 0.02
0.014
0.015
0.012

0.01 0.01
0.008
0.005
0.006

0.004 0
A C D E F G H I K L M N P Q R S T V W Y
0.002

0
A C D E F G H I K L M N P Q R S T V W Y
AF186607 (PSD)

0.03
NM_000059 (PSD)
0.025
0.014
0.02
0.012

0.01
0.015

0.008 0.01

0.006 0.005
0.004
0
0.002 A C D E F G H I K L M N P Q R S T V W Y
0
A C D E F G H I K L M N P Q R S T V W Y Figure . 2. Bar plots of PSD values of Cancer and Non-cancer Cells

IEMCON 2011 ORGANISED IN COLLABORATION WITH IEEE,INDIA 573

You might also like