Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 54

Paradigm shift in Life sciences

Background information
experimental sciences
There is a tendency to look ever deeper in:
Matter e.g. Physics
Universe e.g. Astronomy
Life e.g. Life sciences

Instrumental consequences are increase in detector:


Resolution & sensitivity
Automation & robotization
Therefore experiments change in nature & become
increasingly more complex

One part of the information explosion .


1.20E+10

Human complete draft ( 3.1G bp)


1.00E+10
Arabidopsis (125.4 M bp)

Human chr. 22 (34.5 M bp)

8.00E+09

Drosophila genome (137 M bp)

6.00E+09

C. elegans genome (97 M bp)

4.00E+09

Morowitz

2.00E+09

Yeast genome (14 Mbp)

Various microbial genomes

0.00E+00
1982

1983

1984

1985

1986

1987

1988

1989

1990

1991
Year

1992

1993

1994

1995

1996

1997

1998

1999

2000

Impact in the life sciences


Impact of high throughput methods e.g. Omics
experimentation
genome ===> genomics

New technologies in
Life Sciences research
cell

Methodology/
Technology

DNA

Genomics

RNA

Transcriptomics

protein
metabolites

Proteomics
Metabolomics

UniversityofAmsterdam

Terminology
DNA

Genome

Genomics

RNA

Transcriptome

Transcriptomics

Protein

Proteome

Proteomics

Metabolite

Metabolome

Metabolomics

More than 50-omes including Unknownome

Omics impact

Impact in the life sciences


Impact of high throughput methods e.g. Omics
experimentation
genome ===> genomics

Instrumentation being used in omics


experimentation:
Transcriptomics via among others; micro-arrays, RNA
sequencing
Proteomics via among others; Mass Spectroscopy (MS)
Metabolomics via among others; MS & Nuclear Magnetic
Resonance (NMR)

Results in Paradigm shift in Life


sciences
Past experiments where hypothesis
driven
Evaluate hypothesis
Complement existing knowledge

Present experiments are data driven


Discover knowledge from large amounts
of data

The knowledge cycle (traditional)


Idea!

Literature

Hypothesis

Publication

Experiment

Data

The knowledge cycle (extended)


Idea!

Databases
(e-)Literature

Hypothesis

Publication

Experiment

Data

Life sciences research: from gene to function


nucleus

cell

Gene

DNA

Whole-genome sequence projects


Gene expression by
RNA synthesis

AAAAAAAAA

Genome-wide micro-array analysis

mRNA

mRNA translation by
protein synthesis

High-throughput protein-analysis

NH2

Protein
COOH

Protein function:
-prediction by bioinformatics
-proof by laboratory research

function-1
function-n
function-2

Developments towards Bioinformatics & e-Science


Experiments become increasingly more complex
Driven by increase of detector developments
Results in an increase in amount and complexity
of data
Something has to be done to harness this
development
Bio-informatics to translate data into useful biological,
medical, pharmaceutical & agricultural knowledge

The what of Bioinformatics


Bioinformatics is redefining rules and
scientific approaches, resulting in the
new biology. Within this new paradigm
the traditional scientific boundaries are
blurred, leaving no clear line between
dry or computational and wet-based
approaches

Role of bioinformatics

Genomics

RNA

Transcriptomics

protein
metabolites

Proteomics
Metabolomics
Integrative/System Biology

Data usage/user interfacing

DNA

Bioinformatics

Data integration/fusion

methodology

Data generation/validation

cell

Conclusions
Omics experiments change the face of life sciences
Bioinformatics can be considered to be an essential
enabler and is a form of e-Science
Will help to realize necessary paradigm shift in Life
Science experimentation
Better support of experimentation & optimal use of ICT
infrastructure requires rationalization experimentation
process
Information management essential technology
Bioinformatics can not be decoupled from e-Bio-science
applications
e-Bioscience also has to comprise biomedical applications

Central dogma of molecular biology

DNA

RNA

protein

Paradigm Shift in Biosciences


So far, biologists have focused certain
phenotypes and hunted the genes
responsible, one at a time
Genomics &
Proteomics
New trend is
FunctionalG
Catalog all the parts: genes and proteins
enomics
Understand how each part works
Systems
Model & simulate the collective behaviorBiology
of
the parts

Central dogma of molecular biology

DNA

genome

RNA

transcriptome

protein

proteome

Central dogma of bioinformatics and genomics

Omics data
In the Omics era, we see proliferation of
genome/proteome-wide high throughput data
that are available in public archives
Comparative genome sequences
Sequence variation & phenotypes
Epigenetics & chromatin structure
Regulatory elements & gene expression
Protein expression, modification & localization
Protein domain, structure, interaction
Metabolic, signal, regulatory pathways
Drug, toxicogenomics, toxicoproteomics

Sanger sequencing has been the only DNA


sequencing method for 30 years but
hunger for even greater sequencing throughput
and more economical sequencing technology
NGS has the ability to process millions of
sequence reads in parallel rather than 96 at a
time (1/6 of the cost)
Objections: fidelity, read length, infrastructure
cost, handle large volum of data
.

Many years of hard work


More than 20.000 BAC clones
Each containing about 100kb fragment
Together provided a tiling path through each human
chromosome
Amplification in bacterial culture
Isolation, select pieces about 2-3 kb
Subcloned into plasmid vectors, amplification, isolation
recreate contigs
Refinement, gap closure, sequence quality improvement
(less 1 error/ 40.000 bases)
BAC based approaches toward WGS

Roche/454 FLX: 2004


Illumina Solexa Genome Analyzer: 2006
Applied Biosystems SOLiDTM System: 2007
Helicos HeliscopeTM : recently available
Pacific Biosciencies SMRT: launching 2010

Roche 454 technology

Illumina Solexa

454 vs Solexa

Homopolymers (AAAAA..)
Read length: 400 bp
Number of reads: 400.000
Per-base cost greater
Novo assembly, metagenomics
Read length: 40 bp
Number of reads: millions
Per-base cost cheaper
Ideal for application requiring short reads: ncRNA

Applications of Next-Generation Sequencing

Ancient DNA
DNA mixtures from diverse ecosystems, metagenomics
Resequencing previously published reference strains
Identification of all mutations in an organism
Errors in published literature
Expand the number of available genomes
Comparative studies
Deciphering cells transcripts at sequence level
without knowledge of the genome sequence
Sequencing extremely large genomes, crop plants
Detection of cancer specific alleles avoiding traditional
cloning
Chip-seq: interactions protein-DNA
Epigenomics
Detecting ncRNA
Genetic human variation : SNP, CNV (diseases)

Degraded state of the sample mitDNA sequencing


Nuclear genomes of ancient remains: cave bear, mommoth,
Neanderthal (106 bp )

Problems: contamination modern humans and coisolation bacterial

Key part in regulating gene


expression
Chip: technique to study
DNA-protein interaccions
Recently genome-wide ChIPbased studies of DNA-protein
interactions
Readout of ChIP-derived DNA
sequences onto NGS
platforms
Insights into transcription
factor/histone binding sites
in the human genome
Enhance our understanding
of the gene expression in the
context of specific
environmental stimuli

ncRNA presence in genome difficult to predict by


computational methods with high certainty because the
evolutionary diversity
Detecting expression level changes that correlate with
changes in environmental factors, with disease onset
and progression, complex disease set or severity
Enhance the annotation of sequenced genomes (impact
of mutations more interpretable)

Extreme example:
multiplexing the amplification
of 10 000 human exons using
primers from a programmable
microarray and sequencing
them using NGS.

Characterizing the biodiversity found on Earth


The growing number of sequenced genomes enables us to interpret
partial sequences obtained by direct sampling of specif environmental
niches.
Examples: ocean, acid mine site, soil, coral reefs, human microbiome
which may vary according to the health status of the individual

Common variants have not yet


completly explained complex
disease genetics rare alleles also
contribute
Also structural variants, large and
small insertions and deletions
Accelerating biomedical research

Metagenome of Wanagama Soil

Metagenome of Wanagama Soil

Enable of genome-wide patterns


of methylation and how this
patterns change through the
course of an organisms
development.

Enhanced potential to combine


the results of different
experiments, correlative analyses
of genome-wide methylation,
histone binding patterns and gene
expression, for example.

Epigenetics: beyond the sequence. "The major problem, I think, is chromatin. What
determines whether a given piece of DNA along the chromosome is functioning,
since it's covered with the histones? What is happening at the level of methylation
and epigenetics? You can inherit something beyond the DNA sequence. That's
where the real excitement of genetics is now." (James D. Watson). Chromatin is
defined as the dynamic complex of DNA and histone proteins that makes up
chromosomes.

Epigenetics is defined as the chemical modification of DNA that affects gene


expression but does not involve changes to the underlying DNA sequence. As the
emphasis in biology is switching away from genetic sequence and towards the
mechanisms by which gene activity is controlled, epigenetics is becoming
increasingly popular.

Epigenetic processes are essential for packaging and interpreting the genome, are
fundamental to normal development and are increasingly recognized as being
involved in human disease. Epigenetic mechanisms include, among other things,
histone modification, positioning of histone variants, nucleosome remodelling, DNA
methylation, small and non-coding RNAs. (Nature, 7 Aug 2008).

Reduced sequencing
error
Increment read length
Developing new
bioinformatic tools
Align: MAQ, SOAP
Assembly: SSAKE
Base caller: PyroBayes
Variant detection: MAQ, GEM

Cost reduction: 1000$ for


personal genomics

The growth pattern of Streptomyces sp. GMY01 &


GMR22 on nutrient broth medium at 30oC

Comparison of the genomes of marine sedimentderived strain (GMY01) with those of terrestrial
origin (GMR22) will provide insight into the
environmental adaptation and evolution of
Streptomyces species.

Applications of microbiology genome


sequencing

http://bgiamericas.com

Genome Assembly & Annotation


(Streptomyces sp. GMY01 & GMR22)
The reads were produced from a single lane of
Illumina GAIIx sequencing machine.
The breakdowns of the analysis are:
Short reads quality filtering and trimming.
De novo assembly using Illumina short reads.
Features annotation of the assembled contigs,
including CDS, tRNA, rRNA, ribosome binding site,
signal peptide cleave site, transmembrane helix,
repeat regions (inverted and tandem).
Functional annotation of the assembled contigs
using SEED subsystem annotation
(http://www.theseed.org/).

Features Annotation

CDS, using RAST gene prediction


tRNA, using tRNA-Scan SE
rRNA, using rnammer
RBS, using rbsfinder
Signal peptide, using SignalP
Transmembrane helix, using TMHMM
Inverted repeat region, using IRF
Tandem repeat region, using TRF

The numbers of feature found in the assembled


genome using its respective tools are:
Features

GMY01 (Count)

GMR22 (Count)

Genome Size

7,907,487 bp

11,637,374 bp ?

CDS

6,420

12,222 ?

tRNA

65

186?

rRNA

12

RBS

5,107

Transmembrane Helix

3,312

Signal Peptide

42

801

Inverted Repeats

204

318

Tandem Repeats

3,777

3,320

Navigate

Organism

Comparative Tools

Help

find

Navigate

Organism
Comparative Tools
J aka Widada

Help

find

Closest neighbors

Closest neighbors of Streptomyces sp. (6666666.37


Closest neighbors of Streptomyces sp. e14 (645465.6)
export to file

export to file

clear all filters

clear all filters

Streptomyces
GMR22
display 30
itemssp.
per page

Streptomyces
sp.
GMY01
display 30
items
per page

displaying 1 - 30 of 30

displaying 1 - 30 of 30
Genome I D

Score

Genome Name

Genome I D

Score

Genome Name

100226.1
100226.15

539
528

Streptomyces coelicolor A3(2)


Streptomyces coelicolor A3(2)

457427.3
653045.3

518
449

Streptomyces hygroscopicus ATCC 53653


Streptomyces violaceusniger Tu 4113

457428.4
227882.1
227882.9

496
494
493

Streptomyces lividans TK24


Streptomyces avermitilis MA-4680
Streptomyces avermitilis MA-4680

591159.3
645465.3
253839.3

483
448
413

Streptomyces viridochromogenes DSM 40736


Streptomyces sp. e14
Streptomyces sp. C

749414.3
463191.3
645465.3
227882.9
455632.3
455632.4

441
410
380
365
361
352

Streptomyces
Streptomyces
Streptomyces
Streptomyces
Streptomyces
Streptomyces

bingchenggensis BCW-1
sviceus ATCC 29083
sp. e14
avermitilis MA-4680
griseus subsp. griseus NBRC 13350
griseus subsp. griseus NBRC 13350

680198.5
443255.8

381
368

Streptomyces scabiei 87.22


Streptomyces clavuligerus ATCC 27064

463191.3
566461.4
465541.3

367
360
359

Streptomyces sviceus ATCC 29083


Streptomyces ghanaensis ATCC 14672
Streptomyces sp. Mg1

591159.3
649189.3
379731.5
227882.1
253839.3

347
339
327
326
307

Streptomyces
Streptomyces
Pseudomonas
Streptomyces
Streptomyces

viridochromogenes DSM 40736


sp. ACT-1
stutzeri A1501
avermitilis MA-4680
sp. C

355249.3
443255.19
467200.3

358
355
337

Streptomyces sp. Tu6071


Streptomyces clavuligerus ATCC 27064
Streptomyces griseoflavus Tu4000

455632.3
457430.3

328
321

Streptomyces griseus subsp. griseus NBRC 13350


Streptomyces roseosporus NRRL 11379

100226.1
647653.3
457430.3
100226.15
457431.3

299
297
297
291
289

Streptomyces
Streptomyces
Streptomyces
Streptomyces
Streptomyces

coelicolor A3(2)
sp. ACTE
roseosporus NRRL 11379
coelicolor A3(2)
roseosporus NRRL 15998

465543.3
457429.3
455632.4

319
319
318

Streptomyces sp. SPB74


Streptomyces pristinaespiralis ATCC 25486
Streptomyces griseus subsp. griseus NBRC 13350

457425.4
591167.3
647653.3

313
299
292

Streptomyces albus J1074


Streptomyces flavogriseus ATCC 33331
Streptomyces sp. ACTE

457428.4
379731.4
680198.5
591157.3

286
284
260
248

Streptomyces
Pseudomonas
Streptomyces
Streptomyces

lividans TK24
stutzeri A1501
scabiei 87.22
sp. SPB78

649189.3
457431.3

292
279

Streptomyces sp. ACT-1


Streptomyces roseosporus NRRL 15998

889487.3
683219.3
591157.3

277
264
262

Streptomyces sp. S4
Streptomyces sp. SA3_actG
Streptomyces sp. SPB78

566461.4
208964.1
208963.3
355249.3
457429.3

238
235
231
228
221

Streptomyces
Pseudomonas
Pseudomonas
Streptomyces
Streptomyces

ghanaensis ATCC 14672


aeruginosa PAO1
aeruginosa UCBPP-PA14
sp. Tu6071
pristinaespiralis ATCC 25486

653045.3

259

Streptomyces violaceusniger Tu 4113

457425.4
683219.3
208963.12

215
214
204

Streptomyces albus J1074


Streptomyces sp. SA3_actG
Pseudomonas aeruginosa UCBPP-PA14

Genome ID Score

Genome Name

http:/ / rast.nmpdr.org/ seedviewer.cgi?page=ClosestNeighbors&organism=645465.6

Genome ID Score

Genome Name

http:/ / rast.nmpdr.org/ seedviewer.cgi?page=ClosestNeighbors&organism=6666666.37971


Page 1 of 2

oti

s
tre

res

cs

st
iive
dat
Oxi

m
Os

GMY01
GMR22

Functional Abundance
Analysis

Genome Plasticity and Evolution


???

Genome re-sequencing at NITE

Chlorobi
Basidiomycota
Lentisphaerae
unclassified (derived from other sequences)
Chrysiogenetes
Nitrospirae
Actinobacteria
Proteobacteria
Bacteroidetes
Streptophyta
Arthropoda
Cyanobacteria
Chordata
Firmicutes
unclassified (derived from Bacteria)
Verrucomicrobia
Acidobacteria
Ascomycota
Spirochaetes
Planctomycetes
Chloroflexi
452319

452311

GMR22 GMY01

Lowest Common Ancestor

Outline of the pipeline for genomic analysis of secondary metabolites.

Medema M H et al. Nucl. Acids Res. 2011;nar.gkr466


The Author(s) 2011. Published by Oxford University Press.

antiSMASH: antibiotics & Secondary Metabolite Analysis SHell

Genomic analysis of secondary metabolites of Streptomyces in this


study

GMY01

http://antism
ash.seconda
rymetabolite
s.org/upload/
cdfb620be865-4aabb5154bf7e55abde
6/index.html

GMY01

Number

GMR22

Number

Terpene

Terpene

16

NRPS

NRPS

46

Ectoine

Ectoine

Butyrolactone

Butyrolactone

T1pks

T1pks

67

T1pks-t4pks

T1pks-t4pks

Bacteriocin

Bacteriocin

11

Nrps-butyrolactones

Nrps-butyrolactones

T2pks

T2pks

Nrps-t1pks

Nrps-t1pks

Lantipeptide

Lantipeptide

Siderophore

Siderophore

Other

Other

14

T3pks

T3pks

T4pks

T4pks

15

T4pks-t1pks

T4pks-t1pks

Indole

Hserlactone

T2pks-bacteriocin

Bacteriocinlantipeptide
T3pks-nrps

Oligosaccaridet2pks

Total

28

201

GMR22

Genomic comparison of secondary metabolites gene


clusters between Streptomyces GMY01 & E.14

16

21
AA homology
97-100%

Marine sediment-derived
Genome size: 7.91 Mb
Total genes 6,420

GMY01

E.14

Wasp symbiont
Genome size: 7.93 Mb
Total genes 6,198

You might also like