Bio SCIENCE Kuliah Perdana

Paradigm shift in Life sciences
Background information
experimental sciences
There is a tendency to look ever deeper in:
Matter e.g. Physics
Universe e.g. Astronomy
Life e.g. Life sciences
Instrumental consequences are increase in detector:

Resolution & sensitivity
Automation & robotization
Therefore experiments change in nature & become
increasingly more complex
One part of the information explosion .

1.20E+10
Human complete draft ( 3.1G bp)

1.00E+10
Arabidopsis (125.4 M bp)
Human chr. 22 (34.5 M bp)
8.00E+09
Drosophila genome (137 M bp)
6.00E+09
C. elegans genome (97 M bp)
4.00E+09
Morowitz
2.00E+09
Yeast genome (14 Mbp)
Various microbial genomes
0.00E+00
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
Year
1992
1993
1994
1995
1996
1997
1998
1999
2000
Impact in the life sciences

Impact of high throughput methods e.g. Omics
experimentation
genome ===> genomics
New technologies in
Life Sciences research
cell
Methodology/
Technology
DNA
Genomics
RNA
Transcriptomics
protein
metabolites
Proteomics
Metabolomics
UniversityofAmsterdam
Terminology
DNA
Genome
Genomics
RNA
Transcriptome
Transcriptomics
Protein
Proteome
Proteomics
Metabolite
Metabolome
Metabolomics
More than 50-omes including Unknownome
Omics impact
Impact in the life sciences

Impact of high throughput methods e.g. Omics
experimentation
genome ===> genomics
Instrumentation being used in omics

experimentation:
Transcriptomics via among others; micro-arrays, RNA
sequencing
Proteomics via among others; Mass Spectroscopy (MS)
Metabolomics via among others; MS & Nuclear Magnetic
Resonance (NMR)
Results in Paradigm shift in Life

sciences
Past experiments where hypothesis
driven
Evaluate hypothesis
Complement existing knowledge
Present experiments are data driven

Discover knowledge from large amounts
of data
The knowledge cycle (traditional)

Idea!
Literature
Hypothesis
Publication
Experiment
Data
The knowledge cycle (extended)

Idea!
Databases
(e-)Literature
Hypothesis
Publication
Experiment
Data
Life sciences research: from gene to function

nucleus
cell
Gene
DNA
Whole-genome sequence projects

Gene expression by
RNA synthesis
AAAAAAAAA
Genome-wide micro-array analysis
mRNA
mRNA translation by
protein synthesis
High-throughput protein-analysis
NH2
Protein
COOH
Protein function:
-prediction by bioinformatics
-proof by laboratory research
function-1
function-n
function-2
Developments towards Bioinformatics & e-Science

Experiments become increasingly more complex
Driven by increase of detector developments
Results in an increase in amount and complexity
of data
Something has to be done to harness this
development
Bio-informatics to translate data into useful biological,
medical, pharmaceutical & agricultural knowledge
The what of Bioinformatics

Bioinformatics is redefining rules and
scientific approaches, resulting in the
new biology. Within this new paradigm
the traditional scientific boundaries are
blurred, leaving no clear line between
dry or computational and wet-based
approaches
Role of bioinformatics
Genomics
RNA
Transcriptomics
protein
metabolites
Proteomics
Metabolomics
Integrative/System Biology
Data usage/user interfacing
DNA
Bioinformatics
Data integration/fusion
methodology
Data generation/validation
cell
Conclusions
Omics experiments change the face of life sciences
Bioinformatics can be considered to be an essential
enabler and is a form of e-Science
Will help to realize necessary paradigm shift in Life
Science experimentation
Better support of experimentation & optimal use of ICT
infrastructure requires rationalization experimentation
process
Information management essential technology
Bioinformatics can not be decoupled from e-Bio-science
applications
e-Bioscience also has to comprise biomedical applications
Central dogma of molecular biology
DNA
RNA
protein
Paradigm Shift in Biosciences

So far, biologists have focused certain
phenotypes and hunted the genes
responsible, one at a time
Genomics &
Proteomics
New trend is
FunctionalG
Catalog all the parts: genes and proteins
enomics
Understand how each part works
Systems
Model & simulate the collective behaviorBiology
of
the parts
Central dogma of molecular biology
DNA
genome
RNA
transcriptome
protein
proteome
Central dogma of bioinformatics and genomics
Omics data
In the Omics era, we see proliferation of
genome/proteome-wide high throughput data
that are available in public archives
Comparative genome sequences
Sequence variation & phenotypes
Epigenetics & chromatin structure
Regulatory elements & gene expression
Protein expression, modification & localization
Protein domain, structure, interaction
Metabolic, signal, regulatory pathways
Drug, toxicogenomics, toxicoproteomics
Sanger sequencing has been the only DNA

sequencing method for 30 years but
hunger for even greater sequencing throughput
and more economical sequencing technology
NGS has the ability to process millions of
sequence reads in parallel rather than 96 at a
time (1/6 of the cost)
Objections: fidelity, read length, infrastructure
cost, handle large volum of data
.
Many years of hard work

More than 20.000 BAC clones
Each containing about 100kb fragment
Together provided a tiling path through each human
chromosome
Amplification in bacterial culture
Isolation, select pieces about 2-3 kb
Subcloned into plasmid vectors, amplification, isolation
recreate contigs
Refinement, gap closure, sequence quality improvement
(less 1 error/ 40.000 bases)
BAC based approaches toward WGS
Roche/454 FLX: 2004

Illumina Solexa Genome Analyzer: 2006
Applied Biosystems SOLiDTM System: 2007
Helicos HeliscopeTM : recently available
Pacific Biosciencies SMRT: launching 2010
Roche 454 technology
Illumina Solexa
454 vs Solexa
Homopolymers (AAAAA..)
Read length: 400 bp
Number of reads: 400.000
Per-base cost greater
Novo assembly, metagenomics
Read length: 40 bp
Number of reads: millions
Per-base cost cheaper
Ideal for application requiring short reads: ncRNA
Applications of Next-Generation Sequencing
Ancient DNA
DNA mixtures from diverse ecosystems, metagenomics
Resequencing previously published reference strains
Identification of all mutations in an organism
Errors in published literature
Expand the number of available genomes
Comparative studies
Deciphering cells transcripts at sequence level
without knowledge of the genome sequence
Sequencing extremely large genomes, crop plants
Detection of cancer specific alleles avoiding traditional
cloning
Chip-seq: interactions protein-DNA
Epigenomics
Detecting ncRNA
Genetic human variation : SNP, CNV (diseases)
Degraded state of the sample mitDNA sequencing

Nuclear genomes of ancient remains: cave bear, mommoth,
Neanderthal (106 bp )
Problems: contamination modern humans and coisolation bacterial
Key part in regulating gene

expression
Chip: technique to study
DNA-protein interaccions
Recently genome-wide ChIPbased studies of DNA-protein
interactions
Readout of ChIP-derived DNA
sequences onto NGS
platforms
Insights into transcription
factor/histone binding sites
in the human genome
Enhance our understanding
of the gene expression in the
context of specific
environmental stimuli
ncRNA presence in genome difficult to predict by

computational methods with high certainty because the
evolutionary diversity
Detecting expression level changes that correlate with
changes in environmental factors, with disease onset
and progression, complex disease set or severity
Enhance the annotation of sequenced genomes (impact
of mutations more interpretable)
Extreme example:
multiplexing the amplification
of 10 000 human exons using
primers from a programmable
microarray and sequencing
them using NGS.
Characterizing the biodiversity found on Earth

The growing number of sequenced genomes enables us to interpret
partial sequences obtained by direct sampling of specif environmental
niches.
Examples: ocean, acid mine site, soil, coral reefs, human microbiome
which may vary according to the health status of the individual
Common variants have not yet

completly explained complex
disease genetics rare alleles also
contribute
Also structural variants, large and
small insertions and deletions
Accelerating biomedical research
Metagenome of Wanagama Soil
Metagenome of Wanagama Soil
Enable of genome-wide patterns

of methylation and how this
patterns change through the
course of an organisms
development.
Enhanced potential to combine

the results of different
experiments, correlative analyses
of genome-wide methylation,
histone binding patterns and gene
expression, for example.
Epigenetics: beyond the sequence. "The major problem, I think, is chromatin. What
determines whether a given piece of DNA along the chromosome is functioning,
since it's covered with the histones? What is happening at the level of methylation
and epigenetics? You can inherit something beyond the DNA sequence. That's
where the real excitement of genetics is now." (James D. Watson). Chromatin is
defined as the dynamic complex of DNA and histone proteins that makes up
chromosomes.
Epigenetics is defined as the chemical modification of DNA that affects gene

expression but does not involve changes to the underlying DNA sequence. As the
emphasis in biology is switching away from genetic sequence and towards the
mechanisms by which gene activity is controlled, epigenetics is becoming
increasingly popular.
Epigenetic processes are essential for packaging and interpreting the genome, are
fundamental to normal development and are increasingly recognized as being
involved in human disease. Epigenetic mechanisms include, among other things,
histone modification, positioning of histone variants, nucleosome remodelling, DNA
methylation, small and non-coding RNAs. (Nature, 7 Aug 2008).
Reduced sequencing
error
Increment read length
Developing new
bioinformatic tools
Align: MAQ, SOAP
Assembly: SSAKE
Base caller: PyroBayes
Variant detection: MAQ, GEM
Cost reduction: 1000$ for

personal genomics
The growth pattern of Streptomyces sp. GMY01 &

GMR22 on nutrient broth medium at 30oC
Comparison of the genomes of marine sedimentderived strain (GMY01) with those of terrestrial
origin (GMR22) will provide insight into the
environmental adaptation and evolution of
Streptomyces species.
Applications of microbiology genome

sequencing
http://bgiamericas.com
Genome Assembly & Annotation

(Streptomyces sp. GMY01 & GMR22)
The reads were produced from a single lane of
Illumina GAIIx sequencing machine.
The breakdowns of the analysis are:
Short reads quality filtering and trimming.
De novo assembly using Illumina short reads.
Features annotation of the assembled contigs,
including CDS, tRNA, rRNA, ribosome binding site,
signal peptide cleave site, transmembrane helix,
repeat regions (inverted and tandem).
Functional annotation of the assembled contigs
using SEED subsystem annotation
(http://www.theseed.org/).
Features Annotation
CDS, using RAST gene prediction

tRNA, using tRNA-Scan SE
rRNA, using rnammer
RBS, using rbsfinder
Signal peptide, using SignalP
Transmembrane helix, using TMHMM
Inverted repeat region, using IRF
Tandem repeat region, using TRF
The numbers of feature found in the assembled

genome using its respective tools are:
Features
GMY01 (Count)
GMR22 (Count)
Genome Size
7,907,487 bp
11,637,374 bp ?
CDS
6,420
12,222 ?
tRNA
65
186?
rRNA
12
RBS
5,107
Transmembrane Helix
3,312
Signal Peptide
42
801
Inverted Repeats
204
318
Tandem Repeats
3,777
3,320
Navigate
Organism
Comparative Tools
Help
find
Navigate
Organism
Comparative Tools
J aka Widada
Help
find
Closest neighbors
Closest neighbors of Streptomyces sp. (6666666.37

Closest neighbors of Streptomyces sp. e14 (645465.6)
export to file
export to file
clear all filters
clear all filters
Streptomyces
GMR22
display 30
itemssp.
per page
Streptomyces
sp.
GMY01
display 30
items
per page
displaying 1 - 30 of 30
displaying 1 - 30 of 30
Genome I D
Score
Genome Name
Genome I D
Score
Genome Name
100226.1
100226.15
539
528
Streptomyces coelicolor A3(2)

Streptomyces coelicolor A3(2)
457427.3
653045.3
518
449
Streptomyces hygroscopicus ATCC 53653

Streptomyces violaceusniger Tu 4113
457428.4
227882.1
227882.9
496
494
493
Streptomyces lividans TK24

Streptomyces avermitilis MA-4680
Streptomyces avermitilis MA-4680
591159.3
645465.3
253839.3
483
448
413
Streptomyces viridochromogenes DSM 40736

Streptomyces sp. e14
Streptomyces sp. C
749414.3
463191.3
645465.3
227882.9
455632.3
455632.4
441
410
380
365
361
352
Streptomyces
Streptomyces
Streptomyces
Streptomyces
Streptomyces
Streptomyces
bingchenggensis BCW-1
sviceus ATCC 29083
sp. e14
avermitilis MA-4680
griseus subsp. griseus NBRC 13350
griseus subsp. griseus NBRC 13350
680198.5
443255.8
381
368
Streptomyces scabiei 87.22

Streptomyces clavuligerus ATCC 27064
463191.3
566461.4
465541.3
367
360
359
Streptomyces sviceus ATCC 29083

Streptomyces ghanaensis ATCC 14672
Streptomyces sp. Mg1
591159.3
649189.3
379731.5
227882.1
253839.3
347
339
327
326
307
Streptomyces
Streptomyces
Pseudomonas
Streptomyces
Streptomyces
viridochromogenes DSM 40736

sp. ACT-1
stutzeri A1501
avermitilis MA-4680
sp. C
355249.3
443255.19
467200.3
358
355
337
Streptomyces sp. Tu6071

Streptomyces clavuligerus ATCC 27064
Streptomyces griseoflavus Tu4000
455632.3
457430.3
328
321
Streptomyces griseus subsp. griseus NBRC 13350

Streptomyces roseosporus NRRL 11379
100226.1
647653.3
457430.3
100226.15
457431.3
299
297
297
291
289
Streptomyces
Streptomyces
Streptomyces
Streptomyces
Streptomyces
coelicolor A3(2)
sp. ACTE
roseosporus NRRL 11379
coelicolor A3(2)
roseosporus NRRL 15998
465543.3
457429.3
455632.4
319
319
318
Streptomyces sp. SPB74

Streptomyces pristinaespiralis ATCC 25486
Streptomyces griseus subsp. griseus NBRC 13350
457425.4
591167.3
647653.3
313
299
292
Streptomyces albus J1074

Streptomyces flavogriseus ATCC 33331
Streptomyces sp. ACTE
457428.4
379731.4
680198.5
591157.3
286
284
260
248
Streptomyces
Pseudomonas
Streptomyces
Streptomyces
lividans TK24
stutzeri A1501
scabiei 87.22
sp. SPB78
649189.3
457431.3
292
279
Streptomyces sp. ACT-1

Streptomyces roseosporus NRRL 15998
889487.3
683219.3
591157.3
277
264
262
Streptomyces sp. S4
Streptomyces sp. SA3_actG
Streptomyces sp. SPB78
566461.4
208964.1
208963.3
355249.3
457429.3
238
235
231
228
221
Streptomyces
Pseudomonas
Pseudomonas
Streptomyces
Streptomyces
ghanaensis ATCC 14672

aeruginosa PAO1
aeruginosa UCBPP-PA14
sp. Tu6071
pristinaespiralis ATCC 25486
653045.3
259
Streptomyces violaceusniger Tu 4113
457425.4
683219.3
208963.12
215
214
204
Streptomyces albus J1074

Streptomyces sp. SA3_actG
Pseudomonas aeruginosa UCBPP-PA14
Genome ID Score
Genome Name
http:/ / rast.nmpdr.org/ seedviewer.cgi?page=ClosestNeighbors&organism=645465.6
Genome ID Score
Genome Name
http:/ / rast.nmpdr.org/ seedviewer.cgi?page=ClosestNeighbors&organism=6666666.37971

Page 1 of 2
oti
s
tre
res
cs
st
iive
dat
Oxi
m
Os
GMY01
GMR22
Functional Abundance
Analysis
Genome Plasticity and Evolution

???
Genome re-sequencing at NITE
Chlorobi
Basidiomycota
Lentisphaerae
unclassified (derived from other sequences)
Chrysiogenetes
Nitrospirae
Actinobacteria
Proteobacteria
Bacteroidetes
Streptophyta
Arthropoda
Cyanobacteria
Chordata
Firmicutes
unclassified (derived from Bacteria)
Verrucomicrobia
Acidobacteria
Ascomycota
Spirochaetes
Planctomycetes
Chloroflexi
452319
452311
GMR22 GMY01
Lowest Common Ancestor
Outline of the pipeline for genomic analysis of secondary metabolites.
Medema M H et al. Nucl. Acids Res. 2011;nar.gkr466

The Author(s) 2011. Published by Oxford University Press.
antiSMASH: antibiotics & Secondary Metabolite Analysis SHell
Genomic analysis of secondary metabolites of Streptomyces in this

study
GMY01
http://antism
ash.seconda
rymetabolite
s.org/upload/
cdfb620be865-4aabb5154bf7e55abde
6/index.html
GMY01
Number
GMR22
Number
Terpene
Terpene
16
NRPS
NRPS
46
Ectoine
Ectoine
Butyrolactone
Butyrolactone
T1pks
T1pks
67
T1pks-t4pks
T1pks-t4pks
Bacteriocin
Bacteriocin
11
Nrps-butyrolactones
Nrps-butyrolactones
T2pks
T2pks
Nrps-t1pks
Nrps-t1pks
Lantipeptide
Lantipeptide
Siderophore
Siderophore
Other
Other
14
T3pks
T3pks
T4pks
T4pks
15
T4pks-t1pks
T4pks-t1pks
Indole
Hserlactone
T2pks-bacteriocin
Bacteriocinlantipeptide
T3pks-nrps
Oligosaccaridet2pks
Total
28
201
GMR22
Genomic comparison of secondary metabolites gene

clusters between Streptomyces GMY01 & E.14
16
21
AA homology
97-100%
Marine sediment-derived
Genome size: 7.91 Mb
Total genes 6,420
GMY01
E.14
Wasp symbiont
Genome size: 7.93 Mb
Total genes 6,198

Bio SCIENCE Kuliah Perdana

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Bio SCIENCE Kuliah Perdana

Uploaded by

Copyright:

Available Formats

Paradigm shift in Life sciences

Instrumental consequences are increase in detector:

One part of the information explosion .

Human complete draft ( 3.1G bp)

Human chr. 22 (34.5 M bp)

Drosophila genome (137 M bp)

C. elegans genome (97 M bp)

Yeast genome (14 Mbp)

Various microbial genomes

Impact in the life sciences

More than 50-omes including Unknownome

Impact in the life sciences

Instrumentation being used in omics

Results in Paradigm shift in Life

Present experiments are data driven

The knowledge cycle (traditional)

The knowledge cycle (extended)

Life sciences research: from gene to function

Whole-genome sequence projects

Genome-wide micro-array analysis

Developments towards Bioinformatics & e-Science

The what of Bioinformatics

Data usage/user interfacing

Central dogma of molecular biology

Paradigm Shift in Biosciences

Central dogma of molecular biology

Central dogma of bioinformatics and genomics

Sanger sequencing has been the only DNA

Many years of hard work

Roche/454 FLX: 2004

Roche 454 technology

Applications of Next-Generation Sequencing

Degraded state of the sample mitDNA sequencing

Problems: contamination modern humans and coisolation bacterial

Key part in regulating gene

ncRNA presence in genome difficult to predict by

Characterizing the biodiversity found on Earth

Common variants have not yet

Metagenome of Wanagama Soil

Metagenome of Wanagama Soil

Enable of genome-wide patterns

Enhanced potential to combine

Epigenetics is defined as the chemical modification of DNA that affects gene

Cost reduction: 1000$ for

The growth pattern of Streptomyces sp. GMY01 &

Applications of microbiology genome

Genome Assembly & Annotation

CDS, using RAST gene prediction

The numbers of feature found in the assembled

Closest neighbors of Streptomyces sp. (6666666.37

clear all filters

clear all filters

Streptomyces coelicolor A3(2)

Streptomyces hygroscopicus ATCC 53653

Streptomyces lividans TK24

Streptomyces viridochromogenes DSM 40736

Streptomyces scabiei 87.22

Streptomyces sviceus ATCC 29083

viridochromogenes DSM 40736

Streptomyces sp. Tu6071

Streptomyces griseus subsp. griseus NBRC 13350

Streptomyces sp. SPB74

Streptomyces albus J1074

Streptomyces sp. ACT-1