Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 38

SHES2201

Lecture 4 – Levels n Bioinformatics

Profesor Madya Khairuddin Itam


Room B20, Bioinformatics Division
khair@um.edu.my
03-79676738
Scientists that had spent 40
years looking at proteins
through a microscope. And
today with the merger of IT
and the scientific community,
you can literally--what was
done before in 30 years can
be done in years or months
and, in some cases, in days.
Mick Gallagher
Manager for Life Sciences, Oracle
Genomics: Journey to the Center of Biology
• Without doubt, the greatest achievement in
biology over the past millennium has been the
elucidation of the mechanism of heredity.
• The instructions for assembling every organism on
the planet are all specified in DNA sequences that
can be translated into digital information and
stored in a computer for analysis.
• As a consequence of this revolution, biology in the
21st century is rapidly becoming an information
science.
• Powerful new types of bioinformatics will clearly
be required to assimilate and interpret the data
that will issue from various types of genomics
research.
Eric Lander & Robert Weinberg, Science, 2000
The Path Forward
• How does DNA impact health?
– Identify and understand the difference in DNA
sequence (A,T,C,G) among human populations
• What do all the genes do?
– Discover the functions of human genes by
experimentation and by finding genes with similar
funcs in the model organisms
• What are the functions of nongene areas?
– Identify important elements in the nongene regions
of DNA
• How does info in the genome enable life?
– Explore life at the ultimate level of the whole
organism instead of single genes/proteins.
Diverse applications
• Medicine – customized treatments, …
• Microbes for energy and the environment –
generate clean energy source, clean up toxic
wastes,…
• Bioanthropology – human lineage
• Agriculture, livestock breeding, Bioprocessing –
crops&animals more resistant to diseases,
efficient industrial processes,…
• DNA identification – implicate people accused
of crimes, identify contaminants in air, water, …
Three levels of bioinformatics
1. Analysis of a single gene sequence.
For example:
– Similarity with other known genes
– Evolutionary relationships -- Phylogenetic trees
– Identification of well-defined domains in the sequence

– Sequence features (physical properties, binding sites,


modification sites...)
Three levels of bioinformatics
2. Analysis of complete genomes.
For example:
– Linking gene families, identifying missing ones
– Gene location on the chromosomes, correlation with
function or evolution
– Large-scale events in the evolution of organisms
Three levels of bioinformatics
3. Analysis of genes and genomes with respect to
functional data.
For example:
– Identification of essential genes, or genes involved in
specific processes
– Deletion or mutant genotypes vs. phenotypes
Types of Research Problems in Bioinformatics

• Class membership
– e.g., predictive toxicology
• Prediction of sequences
– e.g., sequences of protein sub-structures
• Classification hierarchies
– e.g., folds, families, super-families
• Shape descriptions
– e.g., binding site descriptions
• Temporal models
– e.g., activity of cells, metabolic pathways
Bioinformatics Tools

– Database & searching


– Computational algorithms
• Alignment
• Similarity
• Clustering
• Pattern Searching
– Structure predictions
– Statistical methods
– Data visualization
Bioinformatics Research

• algorithms + data structures = programs

• algorithms + databases = discoveries

• Combine sophisticated algorithms with the right


content:
– Properly structured
– Carefully curated
– Relevant data fields
– Proper amount of data
„Every database is a model of
some real world system“

Hammer & McLeod (1981) ACM Transactions on


Database Systems, 6:351-386
Levels of biological organization

Organism level

Cellular level
Cytomics

Molecular level
Proteomics

Sequence level
Genomics
Levels of biological organization

Organism level

Cytomics
Cellular level

Proteomics
Molecular level

Genomics
Sequence level
Levels of biological organization
Ecology
Species level

Biodiversity
Organism level

Cytomics
Cellular level

Proteomics
Molecular level

Genomics
Sequence level
Species level
Required are:
 exact definitions of the
hierarchical levels
 the inclusion of optional levels
 models for alternative hierarchies

TRANSTax
Taxonomy database of
the biological species

Status: under development


BIOBASE GmbH
April 1999 E. Wingender 17
Organism
Cellular level
level
Endocrinological
network

Cell type B

BUDDY

Cell type C
Cell type A
Definition of Gene Expression Matrices
in a Multidimensional Space

Factors controlling
transcription:
TRANSFAC

Spatio-temporal
coordinates:
CYTOMER

Conditional determinants:
TRANSPATH
Cellular level

Cell A Cell B

cytoplasm

nucleus
Cellular level

Cell A Application
Application relevance:
relevance: Cell B


 Modeling
Modelingof
ofcell
cellspecificities
specificities(positive
(positiveor
ornegative)
negative)
for
forgene
geneexpression
expressionpatterning
patterning//profiling
profiling
(proteomics
(proteomicsprojects)
projects)


 Identification
Identificationand
andmodeling
modelingof
ofthe
themechanisms
mechanisms
of
ofaction
actionof
ofhormones,
hormones,growth
growthfactors,
factors,cytokines
cytokinesetc.
etc.
Cellular level
Additional databases with similar focus:
The Glandular Organ Development Database
http://www.ana.ed.ac.uk/anatomy/database/orghome.html
in here are:
The Kidney Development Database
Early Development of the Lung
Mammary Gland Development Database
Pancreatic Branching Morphogenesis Database
Prostate Gland Branching Morphogenesis Database
The Salivary Gland Development Database

All of them describe stages of organ developments and the


molecular background in terms of receptors, signal transducers,
transcription factors, extracellular matrix, adhesion molecules etc.
Cellular level

Additional databases with similar focus (contin.):

Gene expression in tooth


Pekka Nieminen, University of Helsinki
http://honeybee.helsinki.fi/toothexp/

Molecular background of tooth development


Database CYTOMER®
http://transfac.gbf.de/CYTOMER
Database CYTOMER®
http://transfac.gbf.de/CYTOMER

Hierarchical representation
of anatomical
(sub)structures in the Organ
table of CYTOMER
Molecular level

cytoplasm

nucleus
Molecular level
Application
Application relevance:
relevance:

 Diagnostic
Diagnosticvalue:
value:Signal
Signaltransduction
transductioncomponents
components
cytoplasm are
areeasily
easilyassayed
assayed


 Signal
Signaltransduction
transductioncomponents
componentsare
areexcellent
excellent
candidates
candidatesasastargets
targetsfor
fordrug
drugdesign
design


 In
Inconnection
connectionwith
withtranscriptional
transcriptionalregulation:
regulation:
nucleus Key to model intracellular regulatory networks
Key to model intracellular regulatory networks
Database TRANSPATH
http://transfac.gbf.de:8080/TRANSPATH/
Sequence level
Sequence level
Application
Application relevance:
relevance:

 Aberrant
Aberranttranscriptional
transcriptionalregulation
regulationis
iscausally
causally
involved
involvedin
innumerous
numerousdiseases,
diseases,e.e.g.
g.cancer,
cancer,
thalassemias,
thalassemias,hemophilia
hemophilia


 Precise
Precisetranscriptional
transcriptionalcontrol
controlis
isaaprerequisite
prerequisite
for
forsuccessful
successfulgene
genetherapy
therapyand
andgenetic
genetic
engineering
engineering


 Promoter
Promoteranalysis
analysisisisimportant
importantfor
forthe
the
identification
identificationof
oftarget
targetgenes
genes
Sequence level

Additional databases with similar focus:


RegulonDB
J. Collado-Vides, CIFN UNAM, Cuernavaca, Mexico
regulatory elements and binding proteins of E. coli

EPD (Eukaryotic Promoter Database)


P. Bucher, ISREC, Epalinges, Switzerland
Locations, sequences, and features of experimentally
proven promoters

TRRD (Transcription Regulatory Region Database)


N. A. Kolchanov, ICG, Novosibirsk, Russia
represents data on the regulatory features of whole genes
Database TRANSFAC
http://transfac.gbf.de/TRANSFAC/

gene, species / tissue protein interacting region method sequence motif factor bound ref.
gene product
AdMLP adenovirus / HeLa -68 to –49 1a, 3, 4a, 4b TGTAGGCCACGTGACCGG UEF, USF, MLTF 5-10
-63 to –52 1b GGCCACGTGACC USF 9
-40 to +35 1a TATAAAA TFIID 9
....
c-fos human/mouse (3T3) around -346 3, 4b CAGTTCCCGTCAATC SCM-ind. f. 40
/HeLa -320 to -299 (SRE) 1a, 3, 4a, 4b GATGTCCatattaGGACATC SRF 2, 42
mouse/fibr.;B cells -313 to –297 3, 4b GATGTCCatATtaGGACATC „factor 1“ 44
(3T3;WEHI 231)
....
vitellogenin II chicken/oviduct -620 to -597 1a GCGTGACCGGAGCTGAAAGAACAC ER 173, 174
Database TRANSFAC
http://transfac.gbf.de/TRANSFAC/
The Three Ontologies of GO
•Molecular Function — elemental activity or task
nuclease, DNA binding, transcription factor

•Biological Process — broad objective or goal


mitosis, signal transduction, metabolism

•Cellular Component — location or complex


nucleus, ribosome, origin recognition complex
The Three Ontologies of GO
•Molecular Function — elemental activity or task
nuclease, DNA binding, transcription factor
For example, the yeast protein GAL4 is annotated with the
two process terms ‘galactose metabolism’ and ‘transcription
regulation’ as well as the component term ‘nucleus’ and the
function term ‘transcriptional activator’.
The Three Ontologies of GO
•Cellular Component — location or complex
nucleus, ribosome, origin recognition complex
Cellular component is the cellular region in which the gene
product acts. It can be a location, such as the nucleus, or a
complex of gene products such as a ribosome or the origin of
recognition complex.
The Three Ontologies of GO
•Biological Process — broad objective or goal
mitosis, signal transduction, metabolism

A gene product is annotated to one or more term in each


ontology, because gene products may have more than one
function, be involved in more than one process and have
more than one site of action.
The goal is:

 to acquire as much and as reliable


experimental data as possible

 to describe the distinct organisational levels


of living systems

 to infer the rules of their organisation

 to make predictions about as yet


uncharacterized parts of the same level

 to describe and deduce the properties of the


next complex level from the underlying ones
´but:

since emergent features of a complex system


are determined by, but only partially predictable
from the characteristics of the underlying levels,

we have to do careful empirical work


on each level again.

You might also like