Bioinformatics

Bioinformatics (BIOT 305)
Simon Kumar Shrestha

Simon.Shrestha@ku.edu.np
Course
• Total Credit > 3 equivalent to 48 hours
• Theory: 50 %
• Practical: 50%
• Evaluation:
• Internal exams
• Lab work presentation
• Paper Presentations
• Attendance & Class performances
• Text Books:
• Bioinformatics, Sequence and Genome Analysis: David W. Mount
• Essential Bioinformatics: Jin Xiong
• References:
• Proteins, Structure and Molecular Properties: Thomas E. Caeighton
• Discovering Genomics, Proteomics and Bioinformatics: A. Malcolm Campbell and Laurie J.
Heyer
Syllabus
UNIT I: Introduction & Biological Databases
What is Bioinformatics? Goal, Scope, Applications, Limitations
DNA Sequencing, Genomic sequencing, Sequencing cDNA libraries of expressed genes, Submission of sequences to the databases, Sequence
accuracy, Computer storages of sequences, Sequence formats, Conversion of one sequence format to another, Multiple sequence formats,
Storage of information in a sequence database, Using the database access program ENTREZ
UNIT II: Sequence Alignment
Pairwise Sequence Alignment: Definition of sequence alignment, Evolutionary basis, Sequence homology versus Sequence similarity, Sequence
similarity versus Sequence Identity, Methods, Scoring matrices, Significance of Sequence Alignment
Database Similarity Searching: Requirements of Database Searching, Heuristic Database Searching, Basic Local Alignment Search Tools (BLAST),
FASTA, Comparison of FASTA and BLAST
Multiple Sequence Alignment: Scoring functions, Exhaustive algorithms, Heuristic algorithms, Practical Issues
Profiles and Hidden Markov Models: Position Specific Scoring Matrices, Profiles, Markov models and Hidden Markov Models
UNIT III: Gene and Promoter Prediction
Gene Prediction: Gene Prediction in Prokaryotes and Eukaryotes, Gene prediction programs
Promoter and Regulatory Element Prediction: Promoter and Regulatory Elements Prediction in Prokaryotes and Eukaryotes, Prediction algorithms
UNIT IV: Molecular Phylogenetics
Molecular Evolution and Molecular Phylogenetics, Terminology, Gene phylogeny vs. Species phylogeny, Forms of Tree representation, Procedure,
Phylogenetic Tree Construction Methods and Programs (Distance – based and Character – based Methods), Tree Evaluation
UNIT V: Structural Bioinformatics
Protein Structure Basics: Amino acids, Peptide Formation, Dihedral Angles, Hierarchy, Secondary Structures, Tertiary Structures, Determination of
Protein 3 – D Structures
Protein Structure Visualization, Comparison and Classification
Protein Secondary Structure Prediction: Secondary Structure prediction for Globular, Transmembrane proteins
Protein Tertiary Structure Prediction: Methods, Homology modeling, Threading and Fold recognition, Ab Initio Protein Structure Prediction
RNA Structure Prediction: Introduction, Types of RNA structures, Methods for RNA Secondary Structure prediction, Ab initio Approach,
Comparative Approach, Performance Evaluation
Intended learning outcome
• Knowledge:
• Knowledge of most widely used available bioinformatics applications and technology.
• An appreciation of the advantages and shortcomings of various bioinformatics
software tools
• An understanding of the appropriate application of a range of bioinformatics
software
• Skills:
• A familiarity with the use of much of the existing software for the analysis of
genomic and post-genomic data.
• An ability to select the most appropriate bioinformatics tools for a given analysis
• An ability to synthesise information
What is Bioinformatics?
• An interdisciplinary research area at the interface between computer and
biological science
• The Discipline of quantitative analysis of information relating to biological
macromolecules with the aid of computers
• The mathematical, statistical and computing methods that aim to solve
biological problems using DNA and amino acid sequences and related
information
• Biology & information Technology
• Involves technology that uses computers for storage, retrieval,
manipulation & distribution of information related to biological
macromolecules such as RNA, DNA and proteins
Need of Bioinformatics
• Large amount of sequence and supplementary information is
generated every year
• What should be done with this information?
• It is stored in the database so that at time of need it can be retrieved
and manipulated
Data Explosion
DNA sequences as information
• DNA sequences can code for an amino acid sequences (mRNAs)
• The DNA can also code for stable RNA sequences:
• tRNA, rRNA, snRNA, siRNA, lncRNA
• DNA sequence act as protein binding site
• DNA code for architectural information
• Intrinsic DNA curvature
• Nucleosome positioning
• DNA code for architectural information:
• Transcriptional initiation
• Origin of replication
• Mutational Hot Spots
RNA sequences as information
• The mRNAs contain several levels of information:
• Specifies amino acid sequence for proteins
• Localization signals
• Stability signals
• Splice signals
• Editing signals
• The tRNAs code for the genetic code
• The rRNAs code for the structure of ribosomes
Protein sequences as information
• The protein sequence can code for an "active site" for enzymes
• The protein sequence can code for structural roles:
microtubules, myosin, collagen, etc.
• The protein sequence can code for ion channels/pumps
• The protein sequence can code for localization information
• The protein sequence can code for modification sites
What is this?
>Hello Find me 
ATGGGACTACCCTGGTACCGCGTACATACAGTAGTTCTGAACGATCCAGGACGGCTGATTTCTGTACACCTAATGCACACTGC
TCTTGTCGCAGGTTGGGCGGGCTCTATGGCCCTGTACGAATTGGCAGTTTTTGACCCATCAGACCCAGTTCTCAATCCCATGT
GGCGTCAAGGTATGTTTGTCATGCCTTTTATGGCTCGTTTGGGTGTAACTCAATCCTGGGGTGGCTGGAGTCTAACTGGTGA
AGTAGCCGATAATCCCGGAATTTGGTCTTTTGAAGGGGTAGCCGCTACCCATATCATCTTGTCAGGTCTATTATTCCTGGCAGC
AGTTTGGCACTGGGTTTACTGGGATCTGGAACTGTTTACCGATCCTCGGACTGGTGAACCAGCCCTAGACCTACCCAAAATG
TTCGGAATTCATTTATTCCTATCTGGTTTGCTTTGTTTTGGCTTCGGAGCCTTCCACCTCACGGGACTATTCGGACCGGGAATG
TGGGTTTCTGACCCCTATGGATTGACGGGAAGTATACAACCTGTCGCTCCTTCCTGGGGGCCTGAAGGATTTAACCCCTTCAA
TGCTGGCGGTATTGCGGCTCACCATATTGCGGCCGGAATTGTTGGCATTATTGCCGGACTATTCCACCCGTCCGTCAGACCAC
CTCAGCGCCTATACAAAGCCCTGCGTATGGGAAATATCGAAACTGTACTATCTAGTAGTATCGCGGCGGTATTCTTTGCGGCTT
TTGTGGTAGCTGGAACTATGTGGTATGGTTCGGCTGCAACTCCGATTGAACTGTTTGGACCTACCCGCTATCAGTGGGATCAG
GGATATTTCCAACAGGAAATTCAGCGCCGGGTACAAAGCAGTATTGCTCAGGGTGACAGCCCCTCAGAAGCATGGTCTAAG
ATTCCTGAAAAACTGGCATTTTATGACTATGTTGGTAACAGTCCCGCTAAAGGCGGTTTGTTCCGCGTCGGTCCGATGAACAA
GGGCGATGGTATTGCTCAAGGTTGGCTCGGACACCCAGTATTCACTGATGCAGAAGGTCGCGAATTAACTGTTCGTCGTCTT
CCTAACTTCTTTGAAACCTTCCCCGTCATTCTGACTGATGCTGATGGCGTAATTCGCGCTGACGTTCCTTTCCGTCGCGCGGA
GTCTCGCTACAGCTTTGAGCAAACTGGGGTGACTGTTTCTTTATATGGTGGTGAACTCAATGGTAAAACCTTCACCGATCCCG
CCTCTGTGAAGAAATATGCCCGCTTTGCTCAACAGGGTGAACCATTTGCCTTTGACCGGGAAACTCTCGGCTCTGATGGGGT
ATTTCGTACCAGTACCCGTGGCTGGTTTACTTTCGGTCACGCTTGCTTTGCTCTGCTTTTCTTCTTTGGTCATATTTGGCACGGT
TCCCGCACCATCTTCCGAGATGTATTTGCTGGGGTGGAAGCTGACCTAGAAGAACAAGTTGAGTGGGGTAACTTCCAGAAA
GTTGGAGACCAAACAACTCGTGTTCAAAAGACCGTCTAA
Goals of Bioinformatics
• To better understand
• A living cell
• How it functions at the molecular level
• Cellular functions are ultimately controlled by the proteins translated via central dogma
of biology
• Specificity and capabilities of the proteins are determined by their sequences
• Generate new insights and provide a “global” perspective of the cell
Bioinformatics to Systems Biology
• When we are able to integrate the ~omics data sets in comprehensive
virtual biological correlation networks
• We are able to make a complete description of complex biological
processes of interest which we can subsequently model or emulate in
silico.
• This modelling will allow us to elucidate specific and pleiotropic gene
functions and relationships. In other words we will be able to
understand (the behaviour of and (un)stability of ) complex
phenotypes
Scope of Bioinformatics
• 3 major aspects of bioinformatics
• Structure analysis
• Structure Prediction of nucleic acids, proteins
• Protein structure classification
• Protein structure comparison
• Sequence Analysis
• Genome comparison, Gene and promoter prediction, sequence alignment, Sequence
database searching
• Function analysis
• Metabolic pathway modeling, Gene expression profiling, Protein interaction prediction,
Protein subcellular localization prediction
Contd…
• These major aspects are accomplished by 2 subfields of
Bioinformatics
• Development of computational tools and databases
• Application of these tools and databases in generating biological knowledge
to better understand the living systems
Techniques frequently used
• Bioinformatics employs a wide range of computational techniques
including:
• sequence and structural alignment
• database design and data mining
• macromolecular geometry
• phylogenetic tree construction
• prediction of protein structure and function
• gene finding
• expression data clustering
Distinction: Bioinformatics & Computational
Biology
• Bioinformatics is limited to sequence, structural, and functional
analysis of genes and genomes
• computational biology encompasses all biological areas that involve
computation.
• For example, mathematical modeling of ecosystems, population dynamics,
application of the game theory in behavioral studies, and phylogenetic
construction using fossil records all employ computational tools, but do not
necessarily involve biological macromolecules
Why do we need Bioinformatics?
• Sequence Analysis
• Processing of DNA and protein
sequences to understand its
function, structure and other
features
• Comparison of sequences to find
similarity with existing sequences
• Gene Expression Profiling
• Measurement of the activity
of thousands of genes at
once
• DNA microarray technology
and SuperSAGA used for
profiling. Also, RNAseq
• Statistical Analysis
performed
• Comparative Genomics
• Study of the relationship of
genomic structures of different
species.
• Helps to understand the
evolutionary processes
• Drug Discovery
• Process of discovering and designing drugs
• Includes, target identification, validation, optimization and trials.
• Specific databases and bioinformatics tools (ADMET) available
Applications of Bioinformatics
• Molecular Medicine
- Genetic diseases e.g. Cystic fibrosis
- Alterations of genomes due to body's response to the environmental
stresses e.g. heart diseases, cancer etc.
- Human genome project helps to understand these types of diseases
Contd…
• Clinical medicine
- Pharmacogenomics (Study of how genetic inheritance of an
individual affects the body's response to the drugs)
- Detailed knowledge of genetic profile of individual has helped the
doctors to prescribe the right therapy from the beginning
Contd…
• Gene Therapy
- Treatment of diseases on the basis of the expression of genes
causing diseases
- This technique is not frequently used these days but that's not too
far also
Contd…
• Drug Design
- Recent drugs target ~ 500 proteins
- Understanding of disease mechanisms can help to find out the new
drugs that can act on the target proteins
• Microbial genome applications
- Microbial genome project can help to utilize variety of microbes for
useful purposes e.g. wastes clean up (Deinococcus radiodurans,
radiation resistant bacteria), industrial processing, energy production
(Chlorobium tepidum, has huge capacity of generating energy from
light) etc.
• Biotechnology
- Corynebacterium glutamicum has been used by chemical industries
for biotechnological production of lysine (Lysine is rich source in
animal nutrition)
- Xanthomonas campestris is used to produce exopolysacharide
xanthan gums
- Lactobacillus lactis in dairy industries
• Antibiotic Resistance
- Enterococcus faecalis - a leading cause of bacterial infection among
hospital patients
- Discovery of the antibiotic-resistant virulence regions of the bacteria
may help to provide the useful markers for detecting pathogenic
strains and help to establish controls to prevent the spread of
infections in wards.
• Forensic analysis of microbes
- Genomic tools used to distinguish the strains of Bacillus anthryacis
to analyze the terrorists attack of Anthrax in Florida in 2001
• Evolutionary studies
- Archaea, bacteria and eukaryota, 3 domains of life can be used to
analyze the universal common ancestor
• Polymerase Chain Reaction (PCR)
- Primer design
- Accuracy detection of PCR efficiency using PCR testing tools and
software
• Vetinary Science and Comparative studies
- Understanding of animals genomes can help to understand the
biology of animals
- Genome sequencing of different organisms can help to compare the
genomes
Limitations of Bioinformatics
• Bioinformatics results are not always accurate
• Bioinformatics predictions based on the interpretation of
experimental data are not formal proofs of any concepts
• Bioinformatics does not replace the traditional research experimental
methods
• Over reliance on poor quality data can lead to misleading conclusions
since Bioinformatics predictions are solely based on the quality of
data and the algorithms used

Bioinformatics

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Bioinformatics

Uploaded by

Copyright:

Available Formats

Bioinformatics (BIOT 305)

Simon Kumar Shrestha

You might also like