Download as pdf or txt
Download as pdf or txt
You are on page 1of 24

Bioinformatics (BM-307)

Dr. Saima Kashif


Course contents
• Review
Molecular biology and genetics. Introduction to Biological Databases
Introduction to bioinformatics; goals; scope and applications; types of characteristics of databases; information
retrievalsystem.

• Sequence Alignment
Why align sequence? evolutionary basis; sequence homology; similarity and identity; database similarity
searching; heuristic approach; FASTA and BLAST; multiple sequence alignment; scoring function; various
algorithm; Markov and Hidden Markov models; protein motif and domain prediction.
• Gene and Promoter Predication
Gene predication in prokaryotes and eukaryotes; promoters and regulatory elements and prediction in
prokaryotes andeukaryotes; predication algorithms.

• Molecular Phylogenetic
Molecular evolution and molecular phylogenetic; gene and species phylogeny; tree representation.

• Structural Bioinformatics
Protein structure basics; protein structure visualization; comparison and classification; secondary and tertiary;
quaternary structures; RNA structure prediction.

• Genomics and Proteomics


Genome mapping; sequencing; assembly annotation and comparison; functional genomics; sequence-based
approaches; micro array bases approaches; protein expression analysis; protein sorting; protein-protein
interaction.
Sr. CLOs Taxonomy Programme
No. level learningoutcome
(PLO)

At the end of the course, the student will be able to:


EXPLAIN concepts and processes of biology,
1 computer science and mathematics in relation to the C2 1
context of cellular and molecular biology
and genomics research

2 EXPLAIN methods that can be used to solve C2 7


environmental issues using microbial genome
applications, gene therapy and biotechnology.

3 ANALYZE biological data for investigation of C4 4


novel genomic regions
Books
Sessional marks distribution
• 1. Midterm test/exam (20 marks)
• 2. Project (bioinformatics based/group wise)
• 3. Assignment (research paper group wise)
Central Dogma of Molecular Biology

replication
GENOTYPE (i.e. Aa) GENE (DNA) ATGCAAGTCCACTGTATTCCA

transcription reverse
MESSENGER (RNA)
tr
UACGUUCAGGUGACAUAAGGG

translation
PROTEIN

PHENOTYPE (pink) TRAIT


What is Bioinfomratics
• Bioinformatics is the science of storing, retrieving and analysing large
amounts of biological information. It is a highly interdisciplinary field
involving many different types of specialists, including biologists,
molecular life scientists, computer scientists and mathematicians.
• The term bioinformatics was coined by Paulien Hogeweg and Ben Hesper
to describe “the study of informatic processes in biotic systems” and it
found early use when the first biological sequence data began to be
shared. Whilst the initial analysis methods are still fundamental to many
large-scale experiments in the molecular life sciences, nowadays
bioinformatics is considered to be a much broader discipline,
encompassing modelling and image analysis in addition to the classical
methods used for comparison of linear sequences or three-dimensional
structures (Figure 1)
A broad overview of the different types of data that fall within the scope of bioinformatics.
Traditionally, bioinformatics was used to describe the science of storing and analysing
biomolecular sequence data, but the term is now used much more broadly, encompassing
computational structural biology, chemical biology and systems biology (both data integration and
the modelling of systems).
What is Bioinformatics?

Mathematics and Computer


Statistics Science

Biology
• The molecular life sciences have
become increasingly data driven by and
reliant on data sharing through open-
access databases (1). This is as true of the
applied sciences as it is of fundamental
research. Furthermore, it is not necessary
to be a bioinformatician to make use of
bioinformatics databases, methods and
tools. However, as the generation of large
data-sets becomes more and more central
to biomedical research, it’s becoming
increasingly necessary for every
molecular life scientist to understand
what can (and, importantly, what cannot)
be achieved using bioinformatics, and to
be able to work with bioinformatics
experts to design, analyse and interpret
their experiments (Figure 2).
Goal of BI
The ultimate goal of bioinformatics is to better understand a living cell and how it functions at the molecular
level.

By analyzing raw molecular sequence and structural data, bioinformatics research can generate new insights and
provide a “global” perspective of the cell.

The reason that the functions of a cell can be better understood by analyzing sequence data is ultimately
because the flow of genetic information is dictated by the “central dogma” of biology in which DNA is
transcribed to RNA, which is translated to proteins.

Cellular functions are mainly performed by proteins whose capabilities are ultimately determined by their
sequences. Therefore, solving functional problems using sequence and sometimes structural approaches has
proved to be a fruitful endeavor
Scope of BI
Bioinformatics consists of two subfields: the development of computational tools and
databases and the application of these tools and databases in generating biological
knowledge to better understand living systems.
These two subfields are complementary to each other.

The tool development includes writing software for sequence, structural, and functional
analysis, as well as the construction and curating of biological databases.

These tools are used in three areas of genomic and molecular biological research: molecular
sequence analysis, molecular structural analysis, and molecular functional analysis. The
analyses of biological data often generate new problems and challenges that in turn spur the
development of new and better computational tools
Primary and secondary databases
Primary databases

In bioinformatics, and indeed in other data intensive research fields, databases are often
categorised as primary or secondary.

Primary databases are populated with experimentally derived data such as nucleotide
sequence, protein sequence or macromolecular structure. Experimental results are submitted
directly into the database by researchers, and the data are essentially archival in nature. Once
given a database accession number, the data in primary databases are never changed: they
form part of the scientific record.
Secondary databases

By contrast, secondary databases comprise data derived from the results of analysing primary data.
They are often referred to as curated databases but this is a bit of a misnomer because primary
databases are also curated to ensure that the data in them is consistent and accurate.

Secondary databases often draw upon information from numerous sources, including other
databases (primary and secondary), controlled vocabularies and the scientific literature. They are
highly curated, often using a complex combination of computational algorithms and manual analysis
and interpretation to derive new knowledge from the public record of science.

Secondary databases have become the molecular biologist’s reference library over the past decade
or so, providing a wealth of (often daunting) information on just about any gene or gene product that
has been investigated by the research community. The potential for mining this information to make
new discoveries is vast. It’s our job in this course to reduce your activation energy to make more of
these resources for your research.
Applications of BI
• It has applications, for example, in knowledge-based drug design, forensic DNA analysis, and agricultural
biotechnology
• Computational studies of protein–ligand interactions provide a rational basis for the rapid identification of
novel leads for synthetic drugs. Knowledge of the three-dimensional structures of proteins allows molecules
to be designed that are capable of binding to the receptor site of a target protein with great affinity and
specificity.
• It is worth mentioning that genomics and bioinformatics are now poised to revolutionize our healthcare
system by developing personalized and customized medicine.
• The high-speed genomic sequencing coupled with sophisticated informatics technology will allow a doctor in
a clinic to quickly sequence a patient’s genome and easily detect potential harmful mutations and to engage
in early diagnosis and effective treatment of diseases.
• Bioinformatics tools are being used in agriculture as well. Plant genome databases and gene expression
profile analyses have played an important role in the development of new crop varieties that have higher
productivity and more resistance to disease.
Limitations
• It is no stretch in analogy that fighting diseases or other biological problems using bioinformatics is like fighting
battles with intelligence.
• Dependent on experimental science
Bioinformatics and experimental biology are independent, but complementary, activities. Bioinformatics depends on
experimental science to produce raw data for analysis. It, in turn, provides useful interpretation of experimental data
and important leads for further experimental research. Bioinformatics predictions are not formal proofs of any concepts.
They do not replace the traditional experimental research methods of actually testing hypotheses.
• Dependent on quality of data and algorithm
In addition, the quality of bioinformatics predictions depends on the quality of data and the sophistication of the
algorithms being used. Sequence data from high throughput analysis often contain errors. If the sequences are wrong or
annotations incorrect, the results from the downstream analysis are misleading as well. That is why it is so important to
maintain a realistic perspective of the role of bioinformatics. Most algorithms lack the capability and sophistication to
truly reflect reality. They often make incorrect predictions that make no sense when placed in a biological context.
Errors in sequence alignment, for example, can affect the outcome of structural or phylogenetic analysis.
Trade off between accuracy and computational feasibility
• The outcome of computation also depends on the computing power available. Many accurate but
exhaustive algorithms cannot be used because of the slow rate of computation. Instead, less accurate but
faster algorithms have to be used. This is a necessary trade-off between accuracy and computational
feasibility. Therefore, it is important to keep in mind the potential for errors produced by bioinformatics
programs.

Careful Interpretation
• Caution should always be exercised when interpreting prediction results. It is a good practice to use multiple
programs, if they are available, and perform multiple evaluations. A more accurate prediction can often be
obtained if one draws a consensus by comparing results from different algorithms.
• https://www.ebi.ac.uk/training/online/courses/bioinformatics-
terrified/#vf-tabs__section--contents

You might also like