Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

TITLE PAGE

Total Pages 5
Subject
Computational Biology, a step towards better lifestyle.

Prepared By
Ateeb Ahmad Rizwan SP20-BSE-018
Azka Malik SP20-BSE-022
Ahsan Ashraf SP20-BSE-005
Areeba Akbar SP20-BSE-012
Arbab Hussain SP20-BSE-011
Ali Ijaz SP20-BSE-007
Imtanan Mehmood SP20-BSE-039
Huzaifah Goher SP20-BSE-037

Prepared for
Maryam Mehmood

Date of Submission
12/31/2020
Intro
Computational biology is an area of bioinformatics which uses biological data to design algorithm and models to
understand biological systems and relationships. Computational biology studies biological, ecological, behavioral, and
social systems using data-analytical and theoretical methods, mathematical modelling and computational simulation
techniques.

Hypothesis
Computational biology can prove to be step towards a better lifestyle.

Research Questions
1. Is computational biology really helping in acquiring results at low cost and in less time as-well?
2. Is it really more accessible to people as compared to old methods?

Objectives
To elaborate benefits we are achieving from computational biology as a field.

Background:
In the last half of twentieth century there were many accomplishment in the field of molecular biology, i.e. structures of
macromolecules were predicted, the chemical nature of DNA was discovered, and the genome sequences of several
organisms were largely or completely determined.

These achievements urged the development of a field which could help in storing and managing such useful and
enormous data. This lead to the development of Bioinformatics in 1970s.

Availability of biologically useful data encouraged scientists to simulate physiological systems and function from
molecular data. This idea lead to the first conference of computational biology was held in the United States in 2001.

Significance:
Computational Biology, as suggested by its name, is the use of Technology for the advancements in the field of
medicines. The significance of our research is to guide and inform people what computers have done in order to simplify
and ease life. As the time is passing by everything is modernizing and increasing whether it has positive or negative
effects on humans. If we talk about viruses and bacteria, they need to be taken in account because they are mutating
into much stronger species. In order to keep them in control, scientists need the help of advanced technology in order to
study them completely. To simplify, if there was no technology people would still be dying of simple cough or fever.
Research and Study:
Computational biology and bioinformatics is an interdisciplinary field that develops and
applies computational methods to analyze large collections of biological data, such as genetic sequences, cell
populations or protein samples, to make new predictions or discover new biology.

Now if we talk about computational biology it a vast field with many branches. If we talk about genetics then we
can come to a conclusion that many of the genetic or heritable diseases have been overcome recently through the study
of genes. If we get the comparison graph for the rate of illnesses during the past we years;

The graph shows us that as the years are passing by the effects of genetic diseases on living organisms is increasing day
by year by year. But in the last years its rate has started to decrease.

Cancer
The usual method for identifying the cancer for a pathologist was to view the part of the tumor placed on a slide
with help of a microscope. Upon this the cancer was classified. The process had several limitations, for instance it was
labor intensive and difference in decisions of different pathologists.

At present, advances in computational image analysis and classification has unlocked the potential for using
machine learning-based neural networks to analyze tumor. Other than accurately establishing a comparison between
tumor and normal tissues, a way for analyzing images known as convolutional neural networks (CNN) can determine the
shared traits as well. (Wanner, 2020)
Introduction to JAX-CKBTM and CKB BOOST (Variant relationship)
JAX-CKBTM
JAX Clinical Knowledgebase (JAX-CKBTM), a software that is developed for genomic variant interpretation and
identification of potential treatment approaches with the help of evidence-based literature. JAX-CKBTM takes the curated
data one step ahead and provides a solution to this problem. It offers category variants through which users can
navigate variant relationships and acquire evidence or relevant clinical trial for the variant under question. Category
variants are declared on the basis of protein functional effect and also including gain-of-function or loss-of-function.
(Statz, 2020)

CKB BOOST
CKB BOOST provides us with category variants that are on the basis of position, in codon or exon. Variants within
CKB BOOST can be assigned as a member of a category variant centered on position and/or function. Variant
relationships that are within CKB BOOST are examined through multiple visuals that are linked to tables of efficacy
evidence annotations. JAX-CKBTM briefly describes the results of a clinical or preclinical study, providing the link between
the genomic alteration and tumor type. The list of efficacy evidence annotations for a variant and the category variant(s)
provides user with the information to identify the most appropriate therapy or therapies based on evidence level and
variant tier coding. CKB BOOST also gives its users the ability to search for potentially relevant clinical trials with the help
of category variant relationships. (Statz, 2020)

Let’s take Gene KIT W557_K558delinsE for instance. The program will give its visual information as well
Concept of “Dark DNA”
Another important concept that came into the sequence not long ago is the “dark DNA”. Some animal genomes
seem to be missing certain genes, ones that appear in other similar species and must be present to keep the animals
alive. These apparently missing genes have been dubbed “dark DNA”. And its existence could change the way we think
about evolution. The discovery of dark DNA is so recent that we are still trying to work out how widespread it is and
whether it benefits those species that possess it. However, its very existence raises some fundamental questions about
genetics and evolution.

New revised Immunotherapy


The benefits of immunotherapy have been hypothesized and even proven to
an extent in years before, but the 2010s saw the therapy take center stage. The FDA
approved immunotherapy treatments for many types of cancers in the 2010s, mostly
thanks to industry targeting the protein PD-1/PD-L1. Today, 10 PD-1/L1 “checkpoint
inhibitor” immunotherapy drugs have been approved by the U.S. Food and Drug
Administration for 17 different types of cancer. Immunotherapy, however, is still in its
infancy. For example, it only works for some patients, others do not respond to the treatment at all—and scientists
don’t know why. That being said, all signs point to the 2020s being an even bigger decade for this cancer-busting
treatment.
On the NF-Y regulome as in ENCODE (2019)

ChIP-seq datasets
We considered all available ENCODE ChIP-seq datasets from K562, GM12878 and HeLa-S3 cell lines. Coordinates of
“Optimal IDR thresholded peaks” were retrieved from the ENCODE repository (as of January 31st 2019) as “bed
narrowPeak” file type. Peaks available only on the hg38 assembly were converted to hg19, resulting in an initial number
of 728 experiments. Since in some cases different experiments were available for the same TF, we filtered this initial
dataset as follows. Duplicate experiments for the same TFs in the same cell line were then processed as previously
described : first of all, a total of 277 duplicate experiments performed with antibodies directed against a Tagged protein
were removed. We further discarded all experiments (minima) with less than 10000 peaks or less than half of the peaks
of the other replicates for the same factor. Finally, only experiments with replicates with overlap higher than 66% were
kept, and the one with highest number of peaks was used for downstream analyses. TFs with replicate experiments not
satisfying the latter condition were discarded altogether. Filtering resulted in 519 unique experiments with no replicates
in the same cell line.

Motif enrichment analysis

Motif enrichment analysis was performed with PscanChIP, a tool that given a set of peak summit coordinates evaluates
Global and Local enrichment of TFs binding motifs in genomic regions surrounding the peaks .

Global enrichment estimates over-representation of TFBS motifs in the provided regions compared to a genomic
background, computed on all regions of the genome available for TF binding. A reasonable estimate for the latter can be
identified by DNaseI hypersensitivity. PscanChIP built-in genomic backgrounds thus include background expected
matrices scores to which scores of matrices within input regions are compared, resulting for each matrix in a p-value
expressing the probability of obtaining the same score difference with a set of randomly chosen genomic regions. A
motif whose assigned p-value is significant for global enrichment could correspond to the actual binding site of the TF
for which the ChIP-seq experiment was performed (usually the most significant one) or to binding sites of TFs
coassociating with it across the genome.

Local enrichment evaluates instead over-representation of TFBS motifs with respect to genomic regions flanking those
derived from the ChIP-seq. In particular, the higher the probability to find the motif close to provided peak summits, the
lower the obtained p-value. A globally enriched motif usually is locally enriched, as well. A motif locally but not globally
enriched indicates the binding of a factor colocalizing with the one analyzed by ChIP-seq, but only in a limited subset of
regions.

For both measures, the enrichment was considered significant when the relative p-value was lower than 10−10, in order
to keep only the most robust correlations. For experiments on the K562 cell line, the cell-specific background of
PscanChIP was employed, while for GM12878 and HeLa-S3 cell, for which a cell specific background was not available,
enrichment, was assessed with respect to the “mixed background” option. Regions were scanned by PscanChIP with the
JASPAR 2020 Redundant matrix collection, and the CCAAT-box matrix employed to evaluate its enrichment was
MA0060.1, as in previous work.

Positional bias analysis

PscanChIP predicts the presence of a positional bias between peak summits and the matrix of the factor (when
available). We considered as positive scores those whose p-values lower than 10−10. Thereafter, each factor positive for
the presence of CCAAT was first verified for the actual co-presence of NF-Y peaks, and then precise distances were
computed using each of the corresponding matrices present in JASPAR 2020 Redundant version.

Results of the PscanChIP analysis


Compressive Algorithms
At present, the age of compressive algorithms is here, due to which the use of this makes a completely different
paradigm for the structure of biological data. In search to take advantage from the redundancy inherent in genomic
sequence data, Loh, Baym and Berger introduced compressive genomics, which relies on the compression of data in a
way that it will provide the ability to perform desired computation (such as BLAST search) in compressed representation.
Compressive genomics is on the basis of the concept of compressive acceleration, that is based upon two-stages, that
are known as coarse and fine search.

Coarse search is performed only on the coarse, or representative, subsequences that represent any unique data.
Any representative sequence that is found in any threshold of the query is then expanded into all similar sequences it
represents.

The fine search is performed over the subset of the original database. And finally, the approach provides orders-
of-magnitude runtime improvements to BLAST nucleotide and protein search; improvements occur in runtime as the
databases grow. The CORA read mapper and caBLAST are some of the examples. (Berger, Daniels, & William , 2016)

Genome editing
Genome editing uses site-directed nucleases (sequence-specific nucleases; SSNs) to mutate targeted DNA
sequences in an organism. Using SSN systems, scientists can delete, add, or change specific bases at a designated locus.
SSNs cleave DNA at specific sites and leave a single break (known as a nick) or a double-strand break. Four main classes
of SSNs are used in plant genome editing

1. Meganucleases
These are naturally found in bacteria and were the first SSNs used for genome editing. These are single proteins
that recognize at least 12 nucleotides long sequence in the DNA and cleave that targeted DNA. This leaves a double-
stranded break that can be repaired with the help of NHEJ or HDR by using donor molecule. Its applicants are maize(Zea
mays) and tobacco(Nicotiana spp). This method is difficult and is not widely used.

2. Zinc finger nucleases (ZFNs)


Zinc finger domains contain proteins, bind themselves to DNA and are widespread in nature, often functioning
as transcription factors. These can be used to bind specific sequences of DNA; when fused to the DNA-cutting nuclease
domain of the FokI protein, a ZFN is the resulting hybrid molecule. The purpose of this ZFNs pair is for cutting DNA at
desired locations. ZFNs are used in making of several plant species. These were the first widely used tool for editing
genome.

3. Transcription activator-like effectors nucleases(TALENs)


Transcription activator-like effectors (TALEs) were discovered in the bacterial plant pathogen Xanthomonas and
can be engineered to bind to virtually, any DNA sequence. They were revolutionary because of their ease of design for
specific target DNA sequences. In nature, Xanthomonas species secrete TALEs into plant cells to enable pathogenicity.
TALEs bind to promoters in plant genes to suppress the plant’s resistance to the pathogen. The bacteria encode TALEs
through a simple code or cipher that has been exploited to engineer proteins with custom site specificity in any target
genome. Like ZFNs, TALEs can be fused with the nuclease domain of FokI and are then referred to as TALENs. TALENs are
used in pairs like ZFNs to affect targeted mutations. TALENs have been used to edit genomes in several plants, which
includes rice (Oryza spp.), maize, wheat (Triticum spp.), and soybean(Glycine max).

4. Clustered regularly interspaced palindromic repeats (CRISPR)


CRISPR is the most recent genome editing tool as of yet. Bacteria inherits CRISPR as an innate defense
mechanism against viruses and plasmids that target the cleavage of the DNA using RNA-guidance. In beginning the type
II CRISPR/Cas9 was being used for genome editing, in which in which foreign DNA sequences are incorporated between
repeat sequences at the CRISPR locus and then transcribed into an RNA molecule known as crRNA. The crRNA then
hybridizes with a second RNA, the tracrRNA, and the complex binds to the Cas9 nuclease. The crRNA guides the complex
to the target DNA and, in the case of innate immunity in bacteria, binds to the complementary sequence in the target
DNA that is then cleaved by the Cas9 nuclease. Scientists have dissected the innate CRISPR/Cas9 system and re-
engineered it in such a way that a single RNA, the guide RNA, is needed for Cas9-mediated cleavage of a target sequence
in a genome. Guide RNA design requirements are limited to a unique sequence of about 20 nucleotides in the genome
(to prevent off-target effects) and are restricted near the protospacer adjacent motif sequence, which is specific for the
CRISPR/Cas system. Newer applications of CRISPR include the use of two unique guide RNAs with a modified nuclease
that “nicks” one strand of the DNA, providing greater specificity for targeted deletions. Genome editing, especially
CRISPR, is changing rapidly. Non-Cas9 endonucleases has also been recently described for CRISPR genome editing.
Conclusion:
To conclude we would say that for a better healthy life with less death rate we cannot simplify complicated
genes through study only. In order to run tests and come to conclusions, we need complication machinery and
technology, new methods and innovations. This technology needs proper information to work according to what the
user needs. This information can only be given through proper softwares.
BIBLOGRAPHY
Berger, B., Daniels, N. M., & William , Y. Y. (2016, August). Computational Biology in the 21st Century: Scaling with
Compressive Algorithms. Communications of the ACM.

Statz, C. (2020, December 11). REVEALING TREATMENT OPTIONS WITH CATEGORY VARIANTS. The Jackson laboratory.

Wanner, M. (2020, December 11). UNLOCKING CANCER IMAGE DATA. The Jackson Laboratory.

https://www.newscientist.com/article/mg23731680-200-dark-dna-the-missing-matter-at-the-heart-
ofnature/#ixzz6iDA3nVev https://www.laboratoryequipment.com/559031-A-Decade-in-Review-Top-10-Discoveries-of-

the-2010s/ https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008488

You might also like