Professional Documents
Culture Documents
1 Introduction
1 Introduction
Outline
What is Bioinformatics?
Basic molecular biology
Public databases
Sequence analysis
The scales of bioinformatics
Biological data mining
What is Bioinformatics?
What is Bioinformatics?
Several definitions exist. Michael Liebman proposed a
quite elegant definition:
“The study of the information content and information flow in
biological systems and processes” (Michael Liebman)
Information content: genome project
Information flow in biological systems: molecular transport
Biological systems: cells, organisms, …
Biological processes: metabolic networks
Bioinformatics is the science of using information to
understand aspects of Biology. That is, a discipline where
techniques such as applied mathematics, computer science,
statistics, artificial intelligence, etc. are integrated to solve
biological problems
Information, information, information
Transcription Translation
Weismann
Barrier /
Central
Dogma of
Molecular
Biology
Overview of DNA to RNA to Protein
Assembly
Protein
Sequence/Stru
Sequence analysis cture Analysis
Gene Finding
Computational Problems
Public databases
Information flow in bioinformatics
Data enters the “bioinformatics scope” when a scientist deposits an
experimental result in an appropriate archive
The archive curates and annotates the data
The data is released to the public
Afterwards, the data may be retrieved/analysed:
Integrating the new entry into a search engine
Extracting useful subsets of the data
Deriving new types of information from the data
Aggregating the data, by homology, function, structure
Reannotating the data with new discovered/inferred info.
Quality of data depends on many factors, the techniques used to
experimentally create the data, degree of inference and prediction
involved in the annotation process, etc.
Many publicly available databases:
http://en.wikipedia.org/wiki/List_of_biological_databases
NCBI’s Entrez system
http://www.ncbi.nlm.nih.gov/
Entrez is a search and retrieval system that integrates
information from databases at NCBI (National Center for
Biotechnology Information).
Uniprot http://www.uniprot.org
The Universal Protein Resource (UniProt) is a collaboration between
the European Bioinformatics Institute (EBI), the SIB Swiss Institute of
Bioinformatics and the Protein Information Resource (PIR)
KEGG - http://www.genome.jp/kegg/
Not just about
genes/proteins but
also pathways, that is,
their interactions
ncifcrf
DAVID - http://david.abcc.ncifcrf.gov/