Joint Beca-Ilri Hub, Slu and Unesco Advanced Genomics and Bioinformatics

Joint BecA-ILRI Hub, SLU and UNESCO Advanced
Genomics and Bioinformatics
Mark Wamalwa 7th - 17th October 2013

BecA-ILRI Hub, Nairobi, Kenya
h"p://hub.africabiosciences.org/
h"p://www.Ilri.org/
m.wamalwa@cgiar.org
Plan for the Week
Day 1 Introduction to
Linux
Introduction to Perl
Day 2 Shell Programming
programming
Perl programming Nucleotide and protein

Day 3 contd Sequence Manipulation
Regulatory sequence
CLC Genomics
Day 4 analysis
Cocktail
Day 5 CLC Genomics contd
What is Bioinformatics/ Computational
Biology?
Bioinformatics: Seeks to analyze large sets of biological data in order
to solve biological questions, to formulate hypotheses and to build
models of underlying biological processes involved.
Bioinformatics: collection and storage of biological information

Bulk Data analysis
Bulk Data storage
Bulk Data mining
Computational biology: development of algorithms and statistical
models to analyze biological data
Scope of bioinformatics

Storage and retrieval of biological data
Molecular structures: visualiza9on and analysis, classica9on, predic9on
Sequence analysis: Sequence alignments, database searches, mo9f detec9on
Genomics: annota9on, compara9ve genomics
Phylogeny
Func;onal genomics: Transcriptome, proteome, interactome
Analysis of biochemical networks: metabolic networks, regulatory networks
Systems biology: Modelling and simula9on of dynamical systems

Multidisciplinarity
molecular genomics
biology
genetics mathematics
biochemistry statistics
bioinformatics numerical
biophysics analysis
algorithmics
evolution
image data
analysis management
Multidisciplinary
n Scientists can not be experts in all of these domains
n Problems:
qBiologists (generally) hate statistics and computers
qComputer scientists (generally) ignore statistics and biology
qStatisticians and mathematicians (generally)
Spend their time writing formula everywhere
qComplexity of the biological domain
Each time you try to formulate a rule, there is a possible
counter-example
q Solution: multidisciplinary teams/multi-lab projects
Applications
q Research in biology
Molecular organization of the cell/organism
Development
Mechanisms of evolution
q Medicine
Diagnostic of cancers
Detecting genes involved in cancer
q Pharmaceutical research
mechanisms of drug action
drug target identification
q Biotechnology
Gene therapy
Bioengineering
From wet science to bioinformatics
q Progresses in biology stimulated the incorporation of new methods in
bioinformatics
Structure analysis (since the Genomes (since the 90s)
50s) Genome annotation
structure comparison Comparative genomics
structure prediction Functional classifications
(ontologies)
Sequencing (since the 70s) Transcriptome (since 1997)
Sequence alignment Multivariate analysis
Sequence search in Proteome (~ 2000)
databases Graph analysis
High throughput technologies
Genome projects stimulated drastic improvement of sequencing technology
q Post-genomic era
Genome sequence is not sufficient to predict gene function
This stimulated the development of new experimental methods
transcriptomics (microarrays)
proteomics (Y=2-hybrid, mass spectrometry, ...)
q The "omics" trend:
High throughput methods raised a fashion of "omics.
Some of the "omics" are not associated to any new/high throughput
approach, this is just a new name on a previous method, or on an
abstract concept
Large-scale analyses
q The availability of massive amounts of data enables to address questions
that could not even be imagined a few years ago
genome-scale measurement of transcriptional regulation
comparative genomics
q Downstream analyses require a good understanding of statistics
q Warning: the global trends

the capability to analyze large amounts of data presents a risk to remain at a superficial
level, or to be fooled by forgetting to check the pertinence of the results (with some in-
depth examples)
good news: this does not prevent the authors from publishing in highly quoted journals
Bioinformatics is a science of inference
q The risks of inference
q Any analysis of massive data will unavoidably generate a certain rate of

errors (false positives and false negatives).
q Good research and development will include an evaluation of the error
rates.
q Good methods will minimize the error rate.
q Trade-off between specificity and sensitivity.
Why bioinformatics then ?
nIn most cases, wet biology will be required afterwards to validate the predictions
nBioinformatics can
q Reduce data to a small set of testable predictions
q assign a degree of confidence to each prediction
nThe biologist will often have to chose the appropriate degree of confidence, depending
on the trade between
q cost for validating predictions
q benefit expected from the right predictions
nBioinformatics as in silico biology

q Allows to explore domains that can not be addressed experimentally e.g., the study of past
evolutionary events
Phylogenetic inference and comparative genomics give us insights in the mechanisms of evolution
and in the past evolutionary events
The time scale of these events is however so large (billions of years) that one cannot conceive to
reproduce the inferred events with experimental methods.
Goals of Bioinformatics
Molecular Biology as an Information Science.
What is the Information?
Central Dogma Central Paradigm

of Molecular Biology for Bioinformatics
DNA
-> RNA Genomic Sequence Information
-> Protein -> mRNA (level)
-> Phenotype -> Protein Sequence
-> DNA -> Protein Structure
-> Protein Function
Molecules -> Phenotype
Sequence, Structure, Function
Processes Large Amounts of Information
Mechanism, Specificity, Regulation
Standardized
Statistical
Most cellular functions are performed or
facilitated by proteins. "
Primary biocatalyst"
Cofactor transport/storage"
Mechanical motion/support"
Immune protection"
Control of growth/differentiation"
Information transfer (mRNA)"
Genetic material Protein synthesis (tRNA/mRNA)"
Some catalytic activity"
(idea from D Brutlag, Stanford, graphics from S Strobel)
Scope of Bioinformatics
nDevelopment of computational tools
qWriting software
q Creating databases
nApplication of these tools to generate biological knowledge

q Creating databases
q Molecular sequence analysis
q Molecular sequence analysis

qMolecular structural analysis
qMolecular functional analysis

The Bioinforma;cs PlaAorm
High-performance compu;ng server:
32 total processing cores
128GB of memory (RAM)
8TB of disk space
25TB LTO4 tape backup library
Linux cluster
32 CPUs (AMD 64-bit)
128 Gigabyte RAM
>10 terabytes disk storage
Grid compu;ng
Parallel applica;ons:
> Genome assembly (Newbler, MIRA, Celera,
velvet, CAP3. )
> Genome annota;on (glimmer, )
> Phylogene;c analysis (Beast, Mr Bayes)
> Other sequence analysis tools (BLAST,
clustalw, HMMER, R)

BecA-ILRI Genomics PlaAorm
Opportuni1es for genomics and metagenomics research
Capillary sequencing
ABI 3130-xl ABI 3730-xl ABI 3500-xl
Next genera1on sequencing Genomics

Viral genomics
1 sample = 1 library 454 GS
= 1 plate Func;onal Genomics
500 mb/run pyrosequencer
1/2 cassava genome Metagenomics
1/8 human genome
Bioinformatics Core Activities
Statistical support Training/Capacity Building
Experimental design motif finding
functional/network analysis
Primary data analysis microarray analysis
NGS QC, spatial defect removal Data management
454 GA pipeline NGS data storage and manipulation
Data warehouse facilities : databases
Secondary/downstream analysis
Differential expression Software development
ChIP-seq peak calling Bioconductor packages: NGS annotation
Structural variation, genomic packages
rearrangements Automated NGS analysis packages
SNP and CN analysis
microRNA profiling Bioinformatics tools
GO enrichment Ensembl, Galaxy, Cytoscape
From Sequence (genomics/metagenomics) to impact
phylogenetic
analysis Diagnostics
geographical
mapping
Global diseases
(meta)genome sequencing surveillance
protein
Databases modeling
Vaccine dvlpmt
sequence
variation
analysis Drug dvlpmt
Compilation of complete
genomes, metagenomes, Primer,
annotation and Improved drug
microarray selection
curation of metadata
Extraction of
important biological Environmental
discovery of sustainability
information new micro-
organisms and
pathways
Improved Public
health intervention
Books
nZvelebil, M. & Baum, J.O. Understanding Bioinformatics. (2007) pp. 772

nPevzner, J. (2003). Bioinformatics and Functional Genomics. Wiley.
qAll the slides available at: http://www.bioinfbook.org/
nW. Mount. Bioinformatics: Sequence and Genome Analysis. (2004) pp. 692.
qhttp://www.bioinformaticsonline.org/
nWesthead, D.R., J.H. Parish, and R.M. Twyman. 2002. Bioinformatics. BIOS Scientific Publishers,
Oxford.
nBranden et al. Introduction to Protein Structure. (1998) pp. 410

The BecA Hub team
08 countries, 17 females, 19 males
Australia, Benin, Cameroon, England, Ethiopia, Italy, Kenya, USA
Dankie!!!

Joint Beca-Ilri Hub, Slu and Unesco Advanced Genomics and Bioinformatics

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Joint Beca-Ilri Hub, Slu and Unesco Advanced Genomics and Bioinformatics

Uploaded by

Copyright:

Available Formats

Joint BecA-ILRI Hub, SLU and UNESCO Advanced

Genomics and Bioinformatics

Mark Wamalwa 7th - 17th October 2013

Perl programming Nucleotide and protein

Bioinformatics: collection and storage of biological information

q Downstream analyses require a good understanding of statistics

q Warning: the global trends

q The risks of inference

q Any analysis of massive data will unavoidably generate a certain rate of

nBioinformatics as in silico biology

Central Dogma Central Paradigm

nApplication of these tools to generate biological knowledge

q Molecular sequence analysis

q Molecular sequence analysis

qMolecular functional analysis

Next genera1on sequencing Genomics

nZvelebil, M. & Baum, J.O. Understanding Bioinformatics. (2007) pp. 772

nBranden et al. Introduction to Protein Structure. (1998) pp. 410

You might also like