Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 35

Introduction to Bioinformatics

Outline
What is Bioinformatics?
Basic molecular biology
Public databases
Sequence analysis
The scales of bioinformatics
Biological data mining
What is Bioinformatics?
What is Bioinformatics?
Several definitions exist. Michael Liebman proposed a
quite elegant definition:
 “The study of the information content and information flow in
biological systems and processes” (Michael Liebman)
 Information content: genome project
 Information flow in biological systems: molecular transport
 Biological systems: cells, organisms, …
 Biological processes: metabolic networks
Bioinformatics is the science of using information to
understand aspects of Biology. That is, a discipline where
techniques such as applied mathematics, computer science,
statistics, artificial intelligence, etc. are integrated to solve
biological problems
Information, information, information

 As we know there have been major advances in the


field of molecular biology
 These have been coupled with advances in laboratory
(post)genomic technology
 This has led to an explosive growth in the
collection of biological information
 This deluge of information has led to an absolute
requirement for
1. Computerized databases to store, organize and index the data
2. For specialized tools to view and analyze the data
3. Specialized tools to infer new knowledge from the data
Areas of research(taxonomy of the
Bioinformatics Journal)
Genome Analysis
Sequence Analysis
Phylogenetics
Structural Bioinformatics
Gene Expression
Genetics and Population Analysis
Systems Biology
Data and Text Mining
Databases
Bioimage Informatics
Basic Molecular Biology
Life begins with Cell

 A cell is the smallest structural unit of an organism that is capable of


sustained independent functioning
 All cells have some common features
 What is Life? Can we create it in the lab? Read:
The imitation game—a computational chemical approach to recogni
zing life.
Nature Biotechnology, 24:1203-1206, 2006
2 types of cells:
Prokaryotes & Eukaryotes
Example of cell signaling
Terminology
 The genome is an organism’s complete set of DNA.
 a bacterium contains about 600,000 DNA base pairs
 human and mouse genomes have some 3 billion.
 human genome has 23 distinct chromosomes.
 Each chromosome contains many genes.
 Gene
 basic physical and functional units of heredity.
 specific sequences of DNA bases that encode
instructions on how and when to make proteins.
 Proteins
 Make up the cellular structure
 large, complex molecules made up of smaller subunits
called amino acids.
All Life depends on 3 critical molecules
DNAs
 Hold information on how cell works
RNAs
 Act to transfer short pieces of information to different parts of cell
 Provide templates to synthesize into protein
Proteins
 Form enzymes that send signals to other cells and regulate gene
activity
 Form body’s major components (e.g. hair, skin, etc.)
 Are life’s laborers!
Computationally, all three can be represented as
sequences of a certain 4-letter (DNA/RNA) or 20-letter
(Proteins) alphabet
DNA, RNA, and the Flow of Information
Replication

Transcription Translation
Weismann
Barrier /
Central
Dogma of
Molecular
Biology
Overview of DNA to RNA to Protein

 A gene is expressed in two steps


1) Transcription: RNA synthesis
2) Translation: Protein synthesis
DNA: The Basis of Life
Deoxyribonucleic Acid (DNA)
 Double stranded with complementary strands A-T, C-G
DNA is a polymer
 Sugar-Phosphate-Base
 Bases held together by H bonding to the opposite strand
RNA
RNA is similar to DNA chemically. It is usually
only a single strand. T(hyamine) is replaced by
U(racil)
Some forms of RNA can form secondary
structures by“pairing up” with itself. This can
have impact on its properties dramatically.

tRNA linear and 3D view: http://www.cgl.ucsf.edu/home/glasfeld/tutorial/trna/trna.gif


RNA, continued
Several types exist, classified by function:

 hnRNA (heterogeneous nuclear RNA): Eukaryotic mRNA primary


transcipts with introns that have not yet been excised (pre-mRNA).

 mRNA: this is what is usually being referred to when a


Bioinformatician says “RNA”. This is used to carry a gene’s
message out of the nucleus.

 tRNA: transfers genetic information from mRNA to an amino acid


sequence as to build a protein

 rRNA: ribosomal RNA. Part of the ribosome which is involved in


translation.
Transcription
Transcription is highly regulated. Most DNA is in a
dense form where it cannot be transcribed.
To start, transcription requires a promoter, a small
specific sequence of DNA to which polymerase can
bind (~40 base pairs “upstream” of gene)
Finding these promoter regions is only a partially
solved problem that is related to motif finding.
There can also be repressors and inhibitors acting in
various ways to stop transcription. This makes
regulation of gene transcription complex to
understand.
Definition of a Gene

 Regulatory regions: up to 50 kb upstream of +1 site

 Exons: protein coding and untranslated regions (UTR)


1 to 178 exons per gene (mean 8.8)
8 bp to 17 kb per exon (mean 145 bp)

 Introns: splice acceptor and donor sites, junk DNA


average 1 kb – 50 kb per intron

 Gene size: Largest – 2.4 Mb (Dystrophin). Mean – 27 kb.


Splicing
Splicing and other RNA processing
In Eukaryotic cells, RNA is processed between
transcription and translation.
This complicates the relationship between a DNA
gene and the protein it codes for.
Sometimes alternate RNA processing can lead to an
alternate protein (splice variants) as a result. This is
true in the immune system.
Proteins: Crucial molecules
for the functioning of life
• Structural Proteins: the organism's basic building blocks, eg. collagen,
nails, hair, etc.
• Enzymes: biological engines which mediate multitude of biochemical
reactions. Usually enzymes are very specific and catalyze only a single
type of reaction, but they can play a role in more than one pathway.
• Transmembrane proteins: they are the cell’s housekeepers, eg. By
regulating cell volume, extraction and concentration of small molecules from
the extracellular environment and generation of ionic gradients essential for
muscle and nerve cell function (sodium/potasium pump is an example)

• Proteins are polypeptide chains, constructed by joining a certain kind of


peptides, amino acids, in a linear way
• The chain of amino acids, however folds to create very complex 3D
structures
Translation
The process of going
from RNA to
polypeptide.
Three base pairs of
RNA (called a codon)
correspond to one
amino acid based on
a fixed table.
Always starts with
Methionine and ends
with a stop codon
Amino Acids
Protein Structure: Introduction
Different amino acids
have different properties
These properties will
affect the protein
structure and function
Hydrophobicity, for
instance, is the main
driving force (but not
the only one) of the
folding process
Protein Structure: Hierarchical nature of protein
structure
Primary Structure = Sequence of amino acids
MKYNNHDKIRDFIIIEAYMFRFKKKVKPEVDMTIKEFILLTYLFHQQENTL
PFKKIVSDLCYKQSDLVQHIKVLVKHSYISKVRSKIDERNTYISISEEQRE
KIAERVTLFDQIIKQFNLADQSESQMIPKDSKEFLNLMMYTMYFKNIIKK
HLTLSFVEFTILAIITSQNKNIVLLKDLIETIHHKYPQTVRALNNLKKQGYL
IKERSTEDERKILIHMDDAQQDHAEQLLAQVNQLLADKDHLHLVFE

Secondary Structure Tertiary

Local Interactions Global Interactions


Protein Structure: Why is structure
important?
 The function of a protein depends greatly on its
structure
 The structure that a protein adopts is vital to it’s
chemistry
 Its structure determines which of its amino acids are
exposed to carry out the protein’s function
 Its structure also determines what substrates it can
react with
Protein Structure: Mostly lacking
information
Therefore, it is clear that knowing the structure of a
protein is crucial for many tasks
However, we only know the structure for a very small
fraction of all the proteins that we are aware of
 The UniProtKB/TrEMBL archive contains 23165610 (16886838)
sequences
 The PDB archive of protein structure contains only
84223(76669) structures
In the native state, proteins fold on its own as soon as
they are generated, amino-acid by amino-acid (with few
exceptions e.g. chaperones)  can we predict this
process as to close the gap between protein sequences
and their 3D structures?
Central Dogma of Biology: A Bioinformatics
Perspective

The information for making proteins is stored in DNA. There is


a process (transcription and translation) by which DNA is
converted to protein. By understanding this process and how it
is regulated we can make predictions and models of cells.

Assembly

Protein
Sequence/Stru
Sequence analysis cture Analysis
Gene Finding

Computational Problems
Public databases
Information flow in bioinformatics
Data enters the “bioinformatics scope” when a scientist deposits an
experimental result in an appropriate archive
The archive curates and annotates the data
The data is released to the public
Afterwards, the data may be retrieved/analysed:
 Integrating the new entry into a search engine
 Extracting useful subsets of the data
 Deriving new types of information from the data
 Aggregating the data, by homology, function, structure
 Reannotating the data with new discovered/inferred info.
Quality of data depends on many factors, the techniques used to
experimentally create the data, degree of inference and prediction
involved in the annotation process, etc.
Many publicly available databases:
http://en.wikipedia.org/wiki/List_of_biological_databases
NCBI’s Entrez system
http://www.ncbi.nlm.nih.gov/
Entrez is a search and retrieval system that integrates
information from databases at NCBI (National Center for
Biotechnology Information).
Uniprot http://www.uniprot.org
 The Universal Protein Resource (UniProt) is a collaboration between
the European Bioinformatics Institute (EBI), the SIB Swiss Institute of
Bioinformatics and the Protein Information Resource (PIR)
KEGG - http://www.genome.jp/kegg/
Not just about
genes/proteins but
also pathways, that is,
their interactions
ncifcrf
DAVID - http://david.abcc.ncifcrf.gov/

You might also like