Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 12

Common Gene Network Analysis of Stress

and Stroke Using Bioinformatics Tools


Md. Rakibul Islam1, Md. Liton Ahmed1, Bikash Kumar Paul1, 2, *, Touhid Bhuiyan1,
and Kawsar Ahmed2

1 Department of Software Engineering, Daffodil International University


(DIU), Ashulia, Savar, Dhaka-1342, Bangladesh;
rakibul35-116@diu.edu.bd; bikash12019@gmail.com

2 Department of Information and communication technology, Mawlana Bhashani Science and


Technology
University, Santosh, Tangail, Bangladesh

* Correspondence: bikash12019@gmail.com, bikash.swe@diu.edu.bd, bikash.k.paul@ieee.org

Abstract:

Keywords: Bioinformatics, genomics, proteomics, stroke, stress, protein-chemical interaction, gene


regulatory network, protein-protein association, gene co-expression

1. Introduction:
Bioinformatics has been regarded as an admixture of computer science and biological science in the latest
years, and many researchers are offering to use the word as computational biology. This branch of science
unites computer technology and biology to create an individual system [Kumar and Chordia, 2017].
Bioinformatics is a combination of engineering craft and science. It encloses the uplift of computational
process and resolves biological those processes by tools or applications [Goodman, 2002]. Bioinformatics
comprises many sectors of biological science, notably genomics, genomics, genomic sequencing,
mapping, proteomics, genetics, and gene expression. For the better intellect of biological complexity,
many of the bioinformatics integral software, techniques and databases are not easily accessible and the
biological data are evaluated and processed. Scientists were considered in the 1950s as the significance of
sequence databases and that is why the first protein sequence of the database was constructed in 1956
only after sequences of blood glucose receptors became achievable. The genome sequence of human
information is so immense that if obtained in collections, the evidence would suit into 200 volumes of
1000 pages each and read alone would crave 26 years functioning around the clock. But bioinformatics is
now going to face the task of managing such massive information [Jhala et al., 2011].

Stress is body's nonspecific way of responding to any form of demand. It can be concerning by both good
and bad experiences. All living beings are always under stress for delightful or displeasing activities, that
excite the strength of life, a temporary reason can raise stress. A painful knock and a sensual moment can
be equally stressful [Selye, 1976]. Stress can befall a problem when pressures become fatal and in some
cases, it can be a pioneer to anxiety disorders and depression. According to mental health statistics, 74%
of people have felt so stressed when they have been possessed or incapable to cope with [Mental health
statistics, 2018]. In other studies by the American Institute of Stress demonstrates that 77% of
Americans constantly experience physical conditions and estimated 73% of men consistently taste
psychological effects triggered by stress [The American Institute of Stress, 2017]. Women are more
likely than men to report financial stress. They are also less content with the state of their finances: 49
percent of women indicated being very displeased with their salary, while 40 percent of men are [The
American Institute of Stress, 2017; Bolger et al., 1989].

Worldwide, Stroke is the second prominent cause of death, and in most regions the primary cause of
acquired disability in adults. Stroke is an illness that happens when blood flowing to your brain is
delayed. If blood supply is delayed even for a short time, this can trigger issues. After just a few minutes,
brain cells begin to die without blood consumption or oxygen. Currently, an estimated 15 million persons
in the globe are experiencing from a stroke and 5.8 million people die each year. Stroke has already
expanded the epidemic regularity. Stroke is accountable for more deaths every year than AIDS,
tuberculosis and malaria [Islam et al., 2019]. The prefix of a stroke varies relying upon the area of the
brain affected by absence of oxygen. All strokes entangle prefix that utter to breach of nerve system. The
prefix typically gets up by chance and most commonly happen on one side of the body.

Stroke is linked with huge costs, both economically and in terms of human hardship. Having a ton of
stress in middle-aged and older adults correlated with a significantly higher possibility of stroke [Boden
and Sacco., 2000]. In the present investigation, we analyze stress and stroke related genes, firstly we
identified common liable gene between the two diseases. We established linkage network, protein-protein
interaction network and gene regulatory network of the two associated diseases. Moreover we build
protein-chemical interaction from liable genes using computational biological methods.

2. Proposed Methodology:
Bioinformatics is mostly identified as the utilization of computational methods to estimate and assemble
information relevant to biological molecules and macromolecules [Luscombe et al., 2001]. So few
amounts of previous studies work have now been retained on computational biology or bioinformatics
that includes stress and stroke. In this investigation we work with 2 associated diseases and analyze some
of their biological networks that includes with gene. In this study, few methods are undertaken to reach a
high departure point. These methods are intended at gene processing, gene filtering, gene mining, and
prevalent gene finding, etc. to make sure the stated goal. Every single methods of this research have been
demonstrated in short below. All of these methods of this methodology are inwards by subsections.

2.1 Gene Collection and Filtering: A massive resource of online biological information and data
provides by the National Center for Biotechnology Information (NCBI). NCBI is widely available and
downloadable virtual gene database with essential bioinformatics techniques and utilities and has a major
impact on the biological research sector. The Entrez system of NCBI provides search and retrieval
operations for most of these data from 37 distinct databases [Coordinators, 2017]. The NCBI maintains
vast database of all kind indispensable protein and DNA sequence data for bioinformatics research and
exploration. We have collected responsible genes for stress and stroke from NCBI gene database. After
collecting all the responsible genes for stress and stroke are invited for preprocessing and filtering in next
methods. All the collected genes are for all the animal in the planet but this research is about stress and
stroke of Homo sapiens. The gathered genes are thus filtered and the genes that are solely accountable for
Homo sapiens are saved for further processing.

2.2 Gene Mining and Linkage: Gene mining is one of the most important moves of this study because
any type of information fault can leave an important gene that results in incorrect findings. NCBI's
accumulated genes for chosen illnesses that are placed in text files involve other impertinent text to the
present studies. To collect the appropriate gene or information, text mining processes using R language
are introduced to text files [Feinerer, 2013]. Top most 300 weighted genes are preserved to get off
complex result. After gene mining only gene name, Entrez id of gene and synonym of genes are stored in
text file that makes easier to analyze the data. After gene mining the collected genes for disease are
merged to find gene linkage between the selected two diseases. Gene linkage is the final output of
common responsible genes for stress and stroke.

2.3 Protein-protein Interaction and Protein Chemical Interaction: Many studies for curing and
understanding curing human disease and the opening point of these studies are based on biological
networks. A protein–protein interaction (PPI) embroils two or more proteins nabbing together, mostly to
perform their biological function. DNA replication and many other significant molecular processes in a
cell are completed by molecular machines that are built from a huge amount of protein components
embodied by their PPIs [Andreopoulos, 2013]. Protein-protein interaction network (PPIN) is a collection
of PPIs mostly accrued in online databases. PPINs may complement other datasets, which may lead to
intellect the different parts that avail to the function of a whole biological system [Bapat, 2010]. Small
molecules complexity in biological systems can be valued in connection with the function of the targeted
biomolecules, which in turn is extremely defined by the interaction of their partner [Schwikowski et al.,
2000; Sharan et al., 2007]. Furthermore, as only a fraction of all enzymes is long-lasting substance
products, most prevent target cells from more anticipated but less protein druggability in the closeness of
the network. Many databases provide proteome-wide protein-chemical relationships and various other
protein-chemical interactions on behalf of protein-protein interaction networks, which are significant for
productive substance identification. [Szklarczyk et al., 2015]. In this step, PPIN and Protein-chemical
interaction (PCI) are manufactured using Cytoscape from interrelated common genes of selected two
diseases. Cytoscape is a very popular and trusted open source tool for bioinformatics research and this
tool used for build PPIN [Shannon et al., 2003; Nepusz et al., 2012].

2.4 Gene Regulatory Network: Genes act as a hidden key in guiding the operation of the entire
biological system. The collaboration and interactions of Genes create a dynamic network that attracts
attention to our brisk biodiversity [Cheng et al., 2011]. Gene regulatory network (GRN) plays a
significant role the molecular mechanism fundamental biological processes. A GRN is the ingathering of
molecular regulators in a cell that interact with each other directly or indirectly and with other elements in
the cell [Vijesh et al., 2013]. Two main elements exist in GRNs and these are: nodes and edges. The
network nodes are the genes and the edges are the physical and/or regulatory concerns among the nodes
[Walhout, 2011].

3. Results and Discussion: Stress is the body's way of responding to demand or pressures in many
incident stresses is a healthy response. It boosts us to cope with life’s challenges. On the other side, stroke
is a dignified life-impending medical condition that happens when blood flow to a portion of the brain is
cut off. The main purpose of this research is to design the Gene Regulatory Network with PPIs and PCI
for stress and stroke diseases using Bioinformatics tools. To accomplish the desired destination various
consecutive processes are completed as discussed in the proposed methodology. In this phase of the
research, the results of each process will be shown and observed.

3.1 Gene Filtering and Linkage:


First, the relevant genes for designated illnesses are created without any pre-processing. Liable genes are
calculated as 684 for stress and 1598 for stroke without preprocessing and filtering. The respective genes
for Homo sapiens species are 621 for stress, 758 for stroke after preprocessing with filtering. After
identified the genes for Homo sapiens are then sorted progressive order by their weight. The linkage
between stress and stroke, after gene linkage, there are 128 common genes found between two selected
diseases. These responsible common 128 genes are mined using data mining methods. After the mining
there are only 73 genes are found whose are connected to each other genes. These 73 genes are TP53,
TNF, APOE, VEGFA, IL6, MTHFR, TGFB1, ESR1, ACE, IL10, HIF1A, APP, MMP9, HLA-DRB1,
ADIPOQ, ABCB1, NFKB1, CRP, BDNF, STAT3, CDKN2A, PTGS2, IL1B, VDR, NOS3, TLR4,
COMT, CXCL8, PPARG, SLC6A4, TERT, IGF1, GSTM1, MAPT, LEP, IFNG, BRCA2, JAK2, MMP2,
MAPK1, SERPINE1, GSTT1, PON1, CCL2, BIRC5, NPPB, GSTP1, SOD1, CTLA4, TLR2, HFE,
CXCL12, NOTCH1, XRCC1, MIR21, CYP2C19, FAS, ADRB2, ICAM1, HMOX1, ESR2, RELA,
HMGB1, IL18, SPP1, IL4, AGT, VWF, APOA1, GHRL, AGER, ITGB3 and CYP1A1.

3.2 Protein-protein association and Minimum Network:


Current functional connectivity within a proteome is a more challenging issue. Protein-protein functional
relationships are final elements for any system-level perception of cellular appliance kinds. Protein-
related networks can deliver very reflected, effective aims such as evaluating high-throughput functional
genomics information, filtering and intuitive apparent scaffolding to gloss the functional, structural and
adaptive properties of proteins [Bader et al., 2008; Devos and Russell, 2007; Hu et al., 2007; Kohler et al.,
2008; Jensen et al., 2008]. In (Fig. 1) a protein-protein association network (PPAN) is created by
STRING database with 73 common liable genes of stress and stroke [Jensen et al., 2008]. (Fig.2) shows a
minimum network of the same PPAN is demonstrated by BisoGenet. BisoGenet is an application to
demonstrate and experiment with biomolecular relationships. BisoGenet is capable to create and expose
biological networks in a user-friendly process. A property of BisoGenet is the chance to cover the coding
relations to characterize between genes and their components [Martin et al., 2010].

Figure 1. PPAN by STRING database. In this network edges represent the association of proteins. Yellow
colored edges demonstrate about text mining; pink line shows the experimentally determined; blue line
edges shows that the interaction collected from curated database [Buneman, 2009]. And the black line
edges show the co-expression of proteins.
Fig. 2: A minimum network of PPAN generated by BisoGenet. [Martin et al., 2010]

3.3 Physical interaction and Co-expression network: In system biology network analysis has befallen a
significant process for investigating high-throughput gene expression data and gene function related
mining [Liu et al., 2018]. In this research, for selected two diseases the top-weighted gene co-expression
network (GCN) exploration algorithm was applied to develop gene co-expression networks. Liable
seventy-three GCN and physical interaction modules created for stress and stroke. GCN can be used to
associate genes of the obscure act with the biological method, to prioritize selected disease genes or to
appreciate transcriptional regulatory programs [van Dam et al., 2017]. Genetic physical interaction is the
phenomenon where the effects of one gene are tempered by one or various other genes. Fig.3 and Fig. 4
displayed gene co-expression and genetic physical interaction of common genes of selected two
associated diseases.
Figure 3. Gene co-expression network of 73 genes by GeneMANIA [Warde et al., 2010]. Co-expression
network is an undirected graph, nodes are represents genes. Two genes are connected with an edge if
there is any momentous co-expression relationship is found.
Figure 4. Physical interaction of common genes of stroke and Stress by GeneMANIA [Warde et al.,
2010]. This network is combination of 62.65% physical interaction, 32.51% co-localization, 10.78%
pathway, 5.98% predicted and 2.49% shared protein domains.

3.4 Protein-protein interaction network and Protein-chemical Interaction: The continuation of the
protein interatomic area is guided by a wide diversity of technics for the discovery of PPIs and PCI
[Vyncke et al., 2019]. PPIN and PCI are ultimate for all biological processes. Usually, PCI and PPIN are
used to demonstrate the interaction between proteins and it also figures out the common pathway among
the listed genes [Xia et al., 2019]. NetworkAnalyst is a comprehensive web-based tool, which is featured
to allow attain alternative general and complicated meta-analysis of gene expression data via an
intuitional web interface [Habib et al., 2017]. Using NetworkAnalyst tool simple interaction format (SIF)
files are constructed for the liable genes. SIF files are built PPI and PCI by Cytoscape tool. Cytoscape is a
bioinformatics open-source software environment for the huge range integration of interaction network
data [Shannon et al., 2003; Smoot et al., 2010]. PPI & PCI are generally illustrating interactions between
genes and proteins, these two networks also display that these interactions are connected with genes or
proteins. In Fig. 5 and Fig. 6 demonstrate the PPI and PCI successively for common 73 liable genes. Fig.
5 has total 6985 edges to build the network on the other hand PCI network owns 17579 edges to
demonstrate the whole network.

Figure 5. Cytoscape generated generic protein-protein interaction network [Shannon et al., 2003]. This
network owns 4290 nodes and 6985 edges. 4290 nodes indicate individual genes, and the total amount of
6985 edges exhibit the protein protein relationship between genes.
Figure 6. NetworkAnalyst generated Protein-chemical interaction for stroke and stress [Smoot et al.,
2010]. The network holds 17579 chemical names for 4365 number of proteins. The network demonstrates
the basic formula of drugs for selected diseases.

3.5 Gene Regulatory Network (GRN): Biological processes are mostly illustrated as complex networks
sets of interactions or relations between different biological entities. Basically, each biological entity
interacts with different biological entities. At present, tools of PPI or GRN plays an elementary role in the
intellect of the elaborate biological processes, like as in searching of protein functions and related
networks. Intrinsically disordered proteins (IDPs) conformational pliability allows them to identify and
interacts with various associated partners that may raise the speed of interaction [Choura, 2018]. GRN is
used to explore the connection between 5572 genes each other. Cytoscape is a trusted tool to discover
GRN. In this study, we use Cytoscape to illustrate the regulatory network of common 73 genes [Shannon
et al., 2003; Islam et al., 2019]. In Fig. 7 GRN illustrated that 9297 connection is present in this network
to connect 5572 genes.
Figure 7. Gene Regulatory network for selected 2 diseases. The network builds 9287 connections between
5572 genes. Every gene has its own connection with other genes. This network created by Cytoscape tool
[Shannon et al., 2003].

4. Conclusions: The incredible accomplishment in bioinformatics has been uncovered for the evolution
of traditional genomics. The revolution of bioinformatics processes and instruments is discovering a fresh
territory of analyses and making the indomitable job much easier than before. This study has focused on
the similarity of the selected two diseases, common findings of genes made easier to collect SIF data from
the online tools. The main findings of the study are to demonstrate and describe PPI, PCI and GRN of
liable genes. From these research findings of the involved responsible genes among selected associated
two diseases will be conducive to investigation the diseases as good as to drug design exactly. This study
shows that how much significant similarity exists between stress and stroke in genetically. The upcoming
investigation of the research is to design a common drug for the selected two diseases.

Abbreviation:
PPA = Protein-protein Association
PPAN = Protein-protein Association Network
PPI = Protein-protein Interaction
PPIN = Protein-protein Interaction Network
PCI = Protein-chemical Interaction
GRN = Gene Regulatory Network
GCN = Gene Co-expression Network
SIF = Simple Interaction Format
IDP = Intrinsically Disordered Protein

Acknowledgments: The authors are very grateful to those who participated in this research work.
Conflicts of Interest: The authors declare that they have no competing interests.

References:
Andreopoulos, William and Dirk Labudde.,2013 “Protein-Protein Interaction Networks.”
Bader,S., Kuhner,S. and Gavin,A.C. (2008) Interaction networks for systems biology. FEBS Lett.,
582, 1220–1224.

Bapat, S.A., Krishnan, A., Ghanate, A.D., Kusumbe, A.P. and Kalra, R.S., 2010. Gene expression:
protein interaction systems network modeling identifies transformation-associated molecules and
pathways in ovarian cancer. Cancer research, 70(12), pp.4809-4819.

Boden-Albala, B. and Sacco, R.L., 2000. Lifestyle factors and stroke risk: exercise, alcohol, diet,
obesity, smoking, drug use, and stress. Current atherosclerosis reports, 2(2), pp.160-166.

Bolger, N., DeLongis, A., Kessler, R.C. and Schilling, E.A., 1989. Effects of daily stress on negative
mood. Journal of personality and social psychology, 57(5), p.808

Buneman P. (2009) Curated Databases. In: Agosti M., Borbinha J., Kapidakis S., Papatheodorou C.,
Tsakonas G. (eds) Research and Advanced Technology for Digital Libraries. ECDL 2009. Lecture
Notes in Computer Science, vol 5714. Springer, Berlin, Heidelberg

Cheng C, Yan K-K, Hwang W, Qian J, Bhardwaj N, Rozowsky J, et al. (2011) Construction and
Analysis of an Integrated Regulatory Network Derived from High-Throughput Sequencing Data.
PLoS Comput Biol 7(11): e1002190. https://doi.org/10.1371/journal.pcbi.1002190

Choura, M., 2018. Disorder and interactions: What can dehydrins in cereals tell us
anymore?. Network Biology, 8(4), p.137.

Coordinators, N.R., 2017. Database resources of the national center for biotechnology
information. Nucleic acids research, 45(Database issue), p.D12.

Devos,D. and Russell,R.B. (2007) A more complete, complexed and structured interactome. Curr.
Opin. Struct. Biol., 17, 370–377.

Feinerer, I., 2013. Introduction to the tm Package Text Mining in R. Accessible en ligne: http://cran.
r-project. org/web/packages/tm/vignettes/tm. pdf.

Goodman, N., 2002. Biological data becomes computer literate: new advances in
bioinformatics. Current opinion in biotechnology, 13(1), pp.68-71.

Habib, N., 2017. Drug design and analysis for bipolar disorder and associated diseases: A
bioinformatics approach. Network Biology, 7(2), p.41.

Hu,Z., Mellor,J., Wu,J., Kanehisa,M., Stuart,J.M. and DeLisi,C. (2007) Towards zoomable
multidimensional maps of the cell. Nat. Biotechnol., 25, 547–554.

Islam M.R., Ahmed M.L., Kumar Paul B., Asaduzzaman S., Ahmed K. (2019) Common Gene
Regulatory Network for Anxiety Disorder Using Cytoscape: Detection and Analysis. In: Rojas I.,
Valenzuela O., Rojas F., Ortuño F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2019.
Lecture Notes in Computer Science, vol 11466. Springer, Cham
Jensen, L.J., Kuhn, M., Stark, M., Chaffron, S., Creevey, C., Muller, J., Doerks, T., Julien, P., Roth, A.,
Simonovic, M. and Bork, P., 2008. STRING 8—a global view on proteins and their functional
interactions in 630 organisms. Nucleic acids research, 37(suppl_1), pp.D412-D416.

Jhala, M.K., Joshi, C.G., Purohit, T.J., Patel, N.P. and Sarvaiya, J.G., 2011. Role of bioinformatics in
biotechnology. Information Technology Centre, GAU, Anand. Terdapat di http://openmed. nic.

Kohler,S., Bauer,S., Horn,D. and Robinson,P.N. (2008) Walking the interactome for prioritization of
candidate disease genes. Am. J. Hum. Genet., 82, 949–958.

Kumar A, Chordia N. Role of Bioinformatics in Biotechnology. Res Rev Biosci. 2017;12(1):116


in/1383/01/Role_of_Bioinformatics_in_Biotechnology. pdf (5 Februari 2011).

Liu, W., Li, L., Long, X., You, W., Zhong, Y., Wang, M., Tao, H., Lin, S. and He, H., 2018.
Construction and Analysis of Gene Co-Expression Networks in Escherichia coli. Cells, 7(3), p.19.

Luscombe, N.M., Greenbaum, D. and Gerstein, M., 2001. What is bioinformatics? A proposed
definition and overview of the field. Methods of information in medicine, 40(04), pp.346-358.

Martin, A., Ochagavia, M.E., Rabasa, L.C., Miranda, J., Fernandez-de-Cossio, J. and Bringas, R., 2010.
BisoGenet: a new tool for gene network building, visualization and analysis. BMC
bioinformatics, 11(1), p.91

Mental health statistics: stress, 2018 [https://www.mentalhealth.org.uk/statistics/mental-health-


statistics-stress]

Nepusz, T., Yu, H. and Paccanaro, A., 2012. Detecting overlapping protein complexes in protein-
protein interaction networks. Nature methods, 9(5), p.471.

Selye H. (1976) Stress without Distress. In: Serban G. (eds) Psychopathology of Human
Adaptation. Springer, Boston, MA

Shannon, P., Markiel, A., Ozier, O., Baliga, N.S., Wang, J.T., Ramage, D., Amin, N., Schwikowski, B.
and Ideker, T., 2003. Cytoscape: a software environment for integrated models of biomolecular
interaction networks. Genome research, 13(11), pp.2498-2504.

Sharan R., Ulitsky I., Shamir R. Network-based prediction of protein function. Mol. Syst. Biol.
2007;3:88.

Smoot, M.E., Ono, K., Ruscheinski, J., Wang, P.L. and Ideker, T., 2010. Cytoscape 2.8: new features
for data integration and network visualization. Bioinformatics, 27(3), pp.431-432.

Szklarczyk, D., Santos, A., von Mering, C., Jensen, L.J., Bork, P. and Kuhn, M., 2015. STITCH 5:
augmenting protein–chemical interaction networks with tissue and affinity data. Nucleic acids
research, 44(D1), pp.D380-D384

The American Institute of Stress, 2017 [https://www.stress.org/daily-life]


Schwikowski B., Uetz P., Fields S. A network of protein–protein interactions in yeast. Nat.
Biotechnol. 2000;18:1257–1261.

van Dam, S., Vosa, U., van der Graaf, A., Franke, L. and de Magalhaes, J.P., 2017. Gene co-expression
analysis for functional classification and gene–disease predictions. Briefings in bioinformatics, 19(4),
pp.575-592.

Vijesh, N., Chakrabarti, S.K. and Sreekumar, J., 2013. Modeling of gene regulatory networks: a
review. Journal of Biomedical Science and Engineering, 6(02), p.223.

Vyncke, L., Masschaele, D., Tavernier, J. and Peelman, F., 2019. Straightforward Protein-Protein
Interaction Interface Mapping via Random Mutagenesis and Mammalian Protein Protein
Interaction Trap (MAPPIT). International journal of molecular sciences, 20(9), p.2058.

Walhout, A.J., 2011. Gene-centered regulatory network mapping. In Methods in cell biology (Vol.
106, pp. 271-288). Academic Press.

Warde-Farley, D., Donaldson, S.L., Comes, O., Zuberi, K., Badrawi, R., Chao, P., Franz, M., Grouios, C.,
Kazi, F., Lopes, C.T. and Maitland, A., 2010. The GeneMANIA prediction server: biological network
integration for gene prioritization and predicting gene function. Nucleic acids research, 38(suppl_2),
pp.W214-W220.
Xia, J., Gill, E.E. and Hancock, R.E., 2015. NetworkAnalyst for statistical, visual and network-based
meta-analysis of gene expression data. Nature protocols, 10(6), p.823.

You might also like