Professional Documents
Culture Documents
Lecture 1-2 Intro
Lecture 1-2 Intro
• Sequence Alignment
Why align sequence? evolutionary basis; sequence homology; similarity and identity; database similarity
searching; heuristic approach; FASTA and BLAST; multiple sequence alignment; scoring function; various
algorithm; Markov and Hidden Markov models; protein motif and domain prediction.
• Gene and Promoter Predication
Gene predication in prokaryotes and eukaryotes; promoters and regulatory elements and prediction in
prokaryotes andeukaryotes; predication algorithms.
• Molecular Phylogenetic
Molecular evolution and molecular phylogenetic; gene and species phylogeny; tree representation.
• Structural Bioinformatics
Protein structure basics; protein structure visualization; comparison and classification; secondary and tertiary;
quaternary structures; RNA structure prediction.
replication
GENOTYPE (i.e. Aa) GENE (DNA) ATGCAAGTCCACTGTATTCCA
transcription reverse
MESSENGER (RNA)
tr
UACGUUCAGGUGACAUAAGGG
translation
PROTEIN
Biology
• The molecular life sciences have
become increasingly data driven by and
reliant on data sharing through open-
access databases (1). This is as true of the
applied sciences as it is of fundamental
research. Furthermore, it is not necessary
to be a bioinformatician to make use of
bioinformatics databases, methods and
tools. However, as the generation of large
data-sets becomes more and more central
to biomedical research, it’s becoming
increasingly necessary for every
molecular life scientist to understand
what can (and, importantly, what cannot)
be achieved using bioinformatics, and to
be able to work with bioinformatics
experts to design, analyse and interpret
their experiments (Figure 2).
Goal of BI
The ultimate goal of bioinformatics is to better understand a living cell and how it functions at the molecular
level.
By analyzing raw molecular sequence and structural data, bioinformatics research can generate new insights and
provide a “global” perspective of the cell.
The reason that the functions of a cell can be better understood by analyzing sequence data is ultimately
because the flow of genetic information is dictated by the “central dogma” of biology in which DNA is
transcribed to RNA, which is translated to proteins.
Cellular functions are mainly performed by proteins whose capabilities are ultimately determined by their
sequences. Therefore, solving functional problems using sequence and sometimes structural approaches has
proved to be a fruitful endeavor
Scope of BI
Bioinformatics consists of two subfields: the development of computational tools and
databases and the application of these tools and databases in generating biological
knowledge to better understand living systems.
These two subfields are complementary to each other.
The tool development includes writing software for sequence, structural, and functional
analysis, as well as the construction and curating of biological databases.
These tools are used in three areas of genomic and molecular biological research: molecular
sequence analysis, molecular structural analysis, and molecular functional analysis. The
analyses of biological data often generate new problems and challenges that in turn spur the
development of new and better computational tools
Primary and secondary databases
Primary databases
In bioinformatics, and indeed in other data intensive research fields, databases are often
categorised as primary or secondary.
Primary databases are populated with experimentally derived data such as nucleotide
sequence, protein sequence or macromolecular structure. Experimental results are submitted
directly into the database by researchers, and the data are essentially archival in nature. Once
given a database accession number, the data in primary databases are never changed: they
form part of the scientific record.
Secondary databases
By contrast, secondary databases comprise data derived from the results of analysing primary data.
They are often referred to as curated databases but this is a bit of a misnomer because primary
databases are also curated to ensure that the data in them is consistent and accurate.
Secondary databases often draw upon information from numerous sources, including other
databases (primary and secondary), controlled vocabularies and the scientific literature. They are
highly curated, often using a complex combination of computational algorithms and manual analysis
and interpretation to derive new knowledge from the public record of science.
Secondary databases have become the molecular biologist’s reference library over the past decade
or so, providing a wealth of (often daunting) information on just about any gene or gene product that
has been investigated by the research community. The potential for mining this information to make
new discoveries is vast. It’s our job in this course to reduce your activation energy to make more of
these resources for your research.
Applications of BI
• It has applications, for example, in knowledge-based drug design, forensic DNA analysis, and agricultural
biotechnology
• Computational studies of protein–ligand interactions provide a rational basis for the rapid identification of
novel leads for synthetic drugs. Knowledge of the three-dimensional structures of proteins allows molecules
to be designed that are capable of binding to the receptor site of a target protein with great affinity and
specificity.
• It is worth mentioning that genomics and bioinformatics are now poised to revolutionize our healthcare
system by developing personalized and customized medicine.
• The high-speed genomic sequencing coupled with sophisticated informatics technology will allow a doctor in
a clinic to quickly sequence a patient’s genome and easily detect potential harmful mutations and to engage
in early diagnosis and effective treatment of diseases.
• Bioinformatics tools are being used in agriculture as well. Plant genome databases and gene expression
profile analyses have played an important role in the development of new crop varieties that have higher
productivity and more resistance to disease.
Limitations
• It is no stretch in analogy that fighting diseases or other biological problems using bioinformatics is like fighting
battles with intelligence.
• Dependent on experimental science
Bioinformatics and experimental biology are independent, but complementary, activities. Bioinformatics depends on
experimental science to produce raw data for analysis. It, in turn, provides useful interpretation of experimental data
and important leads for further experimental research. Bioinformatics predictions are not formal proofs of any concepts.
They do not replace the traditional experimental research methods of actually testing hypotheses.
• Dependent on quality of data and algorithm
In addition, the quality of bioinformatics predictions depends on the quality of data and the sophistication of the
algorithms being used. Sequence data from high throughput analysis often contain errors. If the sequences are wrong or
annotations incorrect, the results from the downstream analysis are misleading as well. That is why it is so important to
maintain a realistic perspective of the role of bioinformatics. Most algorithms lack the capability and sophistication to
truly reflect reality. They often make incorrect predictions that make no sense when placed in a biological context.
Errors in sequence alignment, for example, can affect the outcome of structural or phylogenetic analysis.
Trade off between accuracy and computational feasibility
• The outcome of computation also depends on the computing power available. Many accurate but
exhaustive algorithms cannot be used because of the slow rate of computation. Instead, less accurate but
faster algorithms have to be used. This is a necessary trade-off between accuracy and computational
feasibility. Therefore, it is important to keep in mind the potential for errors produced by bioinformatics
programs.
Careful Interpretation
• Caution should always be exercised when interpreting prediction results. It is a good practice to use multiple
programs, if they are available, and perform multiple evaluations. A more accurate prediction can often be
obtained if one draws a consensus by comparing results from different algorithms.
• https://www.ebi.ac.uk/training/online/courses/bioinformatics-
terrified/#vf-tabs__section--contents