Biological Information

You might also like

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 50

Bioinformatics tools for biologists @ the EBI

An overview
Bioinformatics

• The science of storing, retrieving and analyzing large


amounts of biological information

• An interdisciplinary science, involving biologists,


computer scientists and mathematicians

• At the heart of modern biology

2 EBI Overview
“Large-scale” focus

• Data explosion and new types of data


• High-throughput biology
• Emphasis on systems, not reductionism
• Large community of users with no training in
bioinformatics
• Growth of applied biology – molecular medicine,
agriculture, food, environmental sciences…

3 EBI Overview
What is EMBL-EBI?

• Based on the Wellcome


Trust Genome Campus
near Cambridge, UK

• Part of the European


Molecular Biology
Laboratory

• Non-profit organization

4 EBI Overview
The EBI’s mission

• To provide freely available data and bioinformatics


services to all facets of the scientific community in ways
that promote scientific progress
• To contribute to the advancement of biology through
basic investigator-driven research in bioinformatics
• To provide advanced bioinformatics training to scientists
at all levels, from PhD students to independent
investigators
• To help disseminate cutting-edge technologies to
industry
Filler text

5 EBI Overview
Databases and tools
www.ebi.ac.uk
New types of data
Literature
Literatureand
andontologies
ontologies
Genomes
Genomes
Protein
Proteinsequence
sequence
DNA
DNA&&RNA
RNAsequence
sequence
Protein
Proteinstructure
structure
Gene
Geneexpression
expression
Chemical
Chemicalentities
entities
Protein
Proteinfamilies,
families,
motifs
motifs anddomains
and domains

Protein
Proteininteractions
interactions

Pathways
Pathways

Systems
Systems

7 EBI Overview
Databases: molecules to systems
Literature
Literatureand
andontologies
ontologies
CiteXplore,
CiteXplore,GO
GO
Genomes
Genomes
Ensembl
Ensembl
Ensembl
EnsemblGenomes
Genomes Protein
Proteinfamilies,
families,
EGA
EGA motifs
motifs anddomains
and domains
InterPro
InterPro
Nucleotide
Nucleotidesequence Microarray
sequence Microarray&&gene
gene
EMBL-Bank expression
EMBL-Bank expressiondata
data Protein
Proteinstructure
structure
ArrayExpress
ArrayExpress PDBe
PDBe

Protein
Proteininteractions
interactions
IntAct
IntAct Pathways
Pathways
Reactome
Reactome
Proteomes
Proteomes
UniProt,
UniProt,PRIDE
PRIDE

Chemical
Chemicalentities
entities
ChEBI
ChEBI

Systems
Systems
BioModels
BioModels
8 EBI Overview
Database collaborations

9 EBI Overview
Standards development – international collaborations
Genomics
GenomicsStandards
StandardsConsortium
Consortium(GSC)
(GSC)
Genome
Genomeannotation
annotation http://gensc.org
http://gensc.org
www.geneontology.org
www.geneontology.org

Protein
Proteinsequence
sequence
Nucleotide
Nucleotidesequence
sequence www.uniprot.org
www.uniprot.org
www.insdc.org
www.insdc.org

HUPO-
HUPO- Protein
Proteinstructure
structure
Microarray
Microarrayand
andGene
Gene Proteomics www.wwpdb.org
Proteomics www.wwpdb.org
Expression Data (MGED)
Expression Data (MGED) Standards
www.mged.org Standards
www.mged.org Initiative
Initiative(PSI)
(PSI)
www.psidev.info
www.psidev.info

Cheminformatics
Cheminformatics
www.ebi.ac.uk/chebi
www.ebi.ac.uk/chebi Pathways
Pathways
www.reactome.org
www.reactome.org
www.biopax.org
www.biopax.org
Systems
Systemsmodeling
modeling
standards
standards
Metabolomics
MetabolomicsStandards
StandardsInitiative
Initiative(MSI)
(MSI) www.sbml.org
www.sbml.org
www.metabolomicssociety.org
www.metabolomicssociety.org

10 EBI Overview
EBI website: www.ebi.ac.uk

Databases Tools

11 EBI Overview
EBI search engine: EB-eye
Search
Searchall
allmain
main
databases
databases in onego
in one go

12 EBI Overview
Nucleotides: European Nucleotide Archive (ENA)

• ENA provides a comprehensive, accessible


and publicly available repository for
nucleotide sequence data
• Collaboration with GenBank and DDBJ for
data sharing
• It consolidates information from EMBL-
Bank, the European Trace Archive
(containing raw data from electrophoresis-
based sequencing machines) and the
Sequence Read Archive (containing raw
data from next-generation sequencing
platforms)
• Provides access to the whole scale of
sequencing information: from raw data,
through assembly and mapping
information, through to high-level functional
annotation (see figure).

13 EBI Overview
Nucleotides: ENA
Download
Downloaddata
data

Navigate
Navigatetotoview
view
related
related data,e.g.
data, e.g.
taxon-specific
taxon-specific
data
data

Other
Othertype
typeof
ofdata
datainclude
include
SRA experiments
SRA experiments
14 EBI Overview
Genomes: Ensembl & Ensembl Genomes
• Genome browser providing free access to the complete sequences of higher
and model organism

• With Ensembl you can:


 Retrieve all or part of a genome sequence
 Perform sequence alignment using BLAST or BLAT
 Link to genome annotation from microarray results
 View expressed mRNA, protein, etc. in a chromosomal region
 View variations such as SNPs across strains or populations
 View all alternative splicing for a gene
 Explore homologues and phylogenetic tree across > 30 species
 View conserved regions across species

• Ensembl Genomes extends to non-vertebrate genomes

15 EBI Overview
Genomes: Ensembl
Genomic Chromosomes
Chromosomes
Genomicalignments
alignments Genes
Genes

Pick
Pickaagenome
genome

Synteny
Synteny

Gene
Genefamilies
families

SNPs
SNPs

Across species Orthology Within species


Orthology

16 EBI Overview
Genomes: Ensembl Genomes
Using view options, you can
Ensembl-like select to view only the current
Ensembl-likegenome
genomebrowser
browserfor
fornon-
non-
Ensembl
Ensembl vertebrate species gene or the entire expanded gene
vertebrate species
Metazoa
Metazoa tree

Ensembl
EnsemblBacteria
Bacteria Select Orthologue view to
Across species View options see putative orthologues

17 EBI Overview
Retrieving data with Biomart
• BioMart is a search engine that can be used to download data
into a table format

• Many EBI databases are powered by Biomart

• For example, you can use Ensembl Biomart to retrieve:

 All the genes for one species

 Or… only genes on one specific region of a chromosome

 Or… genes on one region of a chromosome associated with an


InterPro domain

 Or…etc.

18 EBI Overview
Biomart – how it works

First Step:
Choose a dataset

Second step:
Add filters to define a
gene set

Third step:
Add attributes to
determine column
output

19 EBI Overview
Biomart results

20 EBI Overview
www.biomart.org

21 EBI Overview
ArrayExpress & Atlas of Gene Expression

• ArrayExpress Archive is a public repository of functional


genomics experiments, including gene expression,
supporting scientific publications
• You can query it to retrieve experimental information and
download functional genomics data
• Atlas of Gene Expression contains a subset of curated
and re-annotated Archive data
• Can be queried for individual gene expression under
different biological conditions across experiments

22 EBI Overview
Transcriptomes: ArrayExpress
ArrayExpress
ArrayExpress
Archive:
Archive:browse
browse
experiments
experiments

Expand
Expandresults
results
Search
Searchby
bykeyword
keyword

Spreadsheets
Spreadsheets
describing
describingthe
the
experiment, sample
experiment, sample
properties
propertiesor
orarray
array
design
design

23 EBI Overview
Transcriptomes: Atlas of Gene Expression
Atlas Search
Searchbybygene
genename
nameor or
Atlasinterface
interface biological condition
biological condition

Gene
Gene
summary
summarypage
page

24 EBI Overview Experiment


Experimentpage
page
Protein sequence: UniProt
• Provides the scientific community with a
comprehensive, richly curated, high-
quality and freely accessible resource of
protein sequence and functional
information

• Users can perform simple and complex


text-based queries, run sequence-based
searches, perform multiple sequence
alignments, etc.

• Consists of:
 UniProtKB/Swiss-prot, manually annotated
 UniProtKB/TrEMBL, computationally analyzed
records
 Uniref, clustered by sequence identity
 UniParc, most comprehensive publicly available
non-redundant protein sequence db, un-annotated
 UniMES, protein sequence from metagenomic and
environmental data

25 EBI Overview
UniPort text search for Brca1

26 EBI Overview
Protein families, motifs & domains: InterPro
• Integrated documentation resource
for protein families, domains and
functional sites

• Protein signatures from different


member databases describing the
same biological protein family or
domain are united into a single
InterPro entry containing
information about the signature(s)
and links to the protein in UniProt

• Links to Gene Ontology indicate the


biological function and process that
the proteins are involved in

27 EBI Overview
Protein families, motifs and domains: InterPro
Compare
Comparemethods
methodsof ofprotein
protein
signature prediction
signature prediction

Visualize
Visualizethe
thetaxonomic
taxonomicrange
range
for
foraaprotein
proteinsignature
signature
View
Viewarchitectures
architecturesofofproteins
proteins
containing
containingaasignature
signature

28 EBI Overview
Molecular interaction database: Intact
• IntAct provides a freely available,
open source database system and
analysis tools for protein interaction
data.
• All interactions are derived from
literature curation or direct user
submissions
• With Intact you can:
 Find molecules that interact with your
protein of interest
 Display interaction networks
 Analyze interaction networks using GO
terms, molecule type, role, etc.
 Download data
 Install IntAct system locally

29 EBI Overview
The Protein Data Bank in Europe (PDBe)
• PDBe is a resource for the collection, organization and dissemination of data
about biological macromolecular structures
• A suite of web-based services allows you to:
 PDBeView and PDBeLite provide a flexible and user-friendly query interface to the PDBe
database
 PDBeAnalysis provides searches and statistical analyses of macromolecular structure and
residue information
 PDBeFold allows performing pairwise or multiple comparisons as well as 3D alignments of
structures
 PDBeChem allows searching for and visualize any molecule in the PDB’s ligand dictionary
 PDBePisa is an interactive tool for exploring macromolecular interfaces and surfaces, predicting
probable quaternary structures (assemblies) and searching the PDB for structurally similar
interfaces and assemblies
 PDBeMotif allows complex searches of the PDB based on small 3D motifs, sequence motifs in
conjunction with ligand environment, secondary structure patterns
 Many more tools available

30 EBI Overview
Structures: PDBe Sequence Linking
Linkingto
to
Sequence
mapping domain
domaindata
data
mapping
Ligands
Ligands

Assemblies
Assemblies

Electron
Electron
density
density
visualization
visualization

Active
Activesites
sites

Surface
Surface
matching
matching
Fold
Foldmatching
matching

31 EBI Overview
PRoteomics IDEntifications database (PRIDE)

• PRIDE is a centralized, standards


compliant, public data repository for
proteomics data

• Provides the proteomics community


with a public repository for protein
and peptide identifications together
with the evidence supporting these
identifications.

• PRIDE is also able to capture


details of post-translational
modifications coordinated relative to
the peptides in which they have
been found.

32 EBI Overview
Enzymes: IntEnz
• IntEnz (Integrated
relational Enzyme
database) is a freely
available resource
focused on enzyme
nomenclature.

• IntEnz contains the


recommendations of the
Nomenclature Committee
of the IUBMB on the
nomenclature and
classification of enzyme-
catalysed reactions.

33 EBI Overview
Chemical entities: ChEBI
• ChEBI is a freely available, manually annotated database of small molecular
entities
• A molecular entity is any constitutionally or isotopically distinct atom, molecule,
ion, ion pair, radical, radical ion, complex, conformer, etc., identifiable as a
separately distinguishable entity, not directly encoded by the genome
• With ChEBI you can:
 Find the correct chemical terminolgy using name, formula or registry number
 Visualize chemical structures
 Perform similarity searches
 View the relationship between molecules using the chEBI ontology
 Bridge the gap between small molecules and the macromolecules they interact with (crosslink to
UniProt and Reactome)
 Downoload chemical structures
 Submit new structures

34 EBI Overview
Chemical entities: ChEBI View
Viewmappings
mappingsto
toother
other
databases
databasessuch
suchas
as
Reactome
Reactomeand
andUniprot
Uniprot
Download
Downloadflatflatfiles,
files,
database
databasedumps
dumpsand and
the
theChEBI
ChEBIOntology
Ontology
for
forlocal
localinstallation
installation

View
View
relationships
relationshipsinin View
Viewstructure,
structure,
the
theChEBI
ChEBI nomenclature,
nomenclature,
Ontology
Ontology formula
formulaand
andmore
more

Link
Linkto
toother
other
databases
databases

35 EBI Overview
Chemogenomics: ChEMBL
• ChEMBL is a publicly available database of drugs, drug-like small molecules
and their targets

• The data includes information about how small molecules bind to their
targets, how these compounds affect cells and whole organisms, and
information on the molecules’ absorption, distribution, metabolism, excretion
and toxicity.

• ChEMBL holds two-dimensional structures, calculated molecular properties


(e.g. logP, molecular weight, Lipinski ‘Rule of Five’ parameters) and
bioactivity data (such as binding constants and pharmacology).

• The bioactivity data is tagged to show links between molecular targets and
published assays, with a set of varying confidence levels.

• Additional data on the clinical progress of compounds is being integrated into


ChEMBL.

36 EBI Overview
Chemogenomics: ChEMBL

ChEMBL

37 EBI Overview
Pathways: Reactome
• A free, online, open-source curated database of pathways and
reactions in human biology

• Information in the database is authored by expert biologist


researchers, maintained by Reactome editorial staff

• Used to infer orthologous events in 22 non-human species including


mouse, rat, chicken, puffer fish, worm, fly, yeast

• Extensively cross-referenced to other resources e.g. NCBI, Ensembl,


UCSC genome Browser, UniProt, PubMed, KEGG, ChEBI and GO.

38 EBI Overview
Pathways: Reactome

View
Viewreactions
reactionsand
andevents
eventsinin
detail
detail

Select
Selectaa
pathway
pathway

Compare
Compareevents
eventsinin
different
differentspecies
species

Export
Exportpathway
pathway
Pathways: Reactome

Display
Displayexpression
expressiondata
data

Link
Linkto
tosource
source
40 EBI Overview databases
databases
Biological ontologies: Gene Ontology (GO)

• The GO project is a collaborative


effort to address the need for
consistent descriptions of gene
products in different databases

• GO develops ontologies that


describe biological processes,
cellular components and molecular
functions in a species-independent
manner

• Also GO annotates several of the


EBI’s databases with GO terms

41 EBI Overview
User support

• 2Can bioinformatics user support – www.ebi.ac.uk/2Can

• Online help pages – www.ebi.ac.uk/help

• E-mail support – www.ebi.ac.uk/support

42 EBI Overview
http://www.ebi.ac.uk/Information/Brochures/

43 EBI Overview
Research
www.ebi.ac.uk/groups
Key facts about research
• The EBI provides a unique environment for bioinformatics
research
• Seven dedicated research groups aim to understand
biology through new approaches to interpreting biological
data
• Services teams also carry out R&D to enhance existing
services and develop new ones
• Research program complements services and the two are
mutually supportive

45 EBI Overview
Research Functional genomics and
small RNA analysis
Vertebrate genome Enright
annotation
Flicek Literature analysis and
semantic data integration
in life science research
Algorithmic methods Rebholz-Schuhmann
for genome analysis
Transcriptome
Birney
analysis on a
genomic scale Analysis of protein
Genome analysis using Brazma structure, function and
evolutionary tools
evolution
Goldman
Thornton

Evolutionary
biology
Marioni Protein Analysis and Neurobiology
sequence validation of protein networks and
analysis and structures; protein– systems
functional ligand interactions Le Novère
annotation Kleywegt
Apweiler

Systems
Biomedicine
Genome-scale
Saez-Rodriguez
Cheminformatics and analysis of
metabolism regulatory
Steinbeck systems
Chemogenomics Luscombe
and drug discovery
Mammalian stem cell
Overington
differentiation and development
Bertone
Training
www.ebi.ac.uk/training
A tripartite user-training programme
Training Training
Trainingany
anytime,
time,anywhere,
anywhere,at
Trainingcomes
comesto
toyou
you at
www.ebi.ac.uk/training/roadshow any
anypace
pace
www.ebi.ac.uk/training/roadshow
www.ebi.ac.uk/training/elearning
www.ebi.ac.uk/training/elearning
eLearning
Bioinformatics programme
Roadshow

Hands-on
training at
EMBL-EBI

Hands-on
Hands-onuser
usertraining
trainingononall
allour
our
core
coredata
dataresources
resourcesforfor
researchers
researchers
www.ebi.ac.uk/training/handson
www.ebi.ac.uk/training/handson
48 EBI Overview
Hands-on training for all levels of experience

• Interactive training in our purpose-built IT training suite at EMBL-


EBI, Hinxton, Cambridge
• Learn from the EBI’s experts through a combination of talks and
practical exercises
• Take a tour of all our core data resources, or focus in on specific
data types
• Full programme at www.ebi.ac.uk/training/handson

49 EBI Overview
eLearning project – pilot phase

• Do you want to learn at your


own pace at a time that suits
you?

• We are developing a new


eLearning platform and need our
users to help us test it

• If you would like to get involved,


contact: elearning@ebi.ac.uk

50 EBI Overview

You might also like