Download as pdf or txt
Download as pdf or txt
You are on page 1of 48

Introduc)on to Bioinforma)cs Online Course:IBT

Introduc)on to Databases and Resources


Protein Classifica)on and Resources

Introduc)on to Bioinforma)cs Online Course:IBT


Introduc)on to Databases and Resources | Shaun Aron
Learning Objec)ves
• Understand how protein sequences are
annotated
• Understand the different levels of protein
classifica)on
• Iden)fy the key resources used for classifying
protein sequences

Introduc)on to Bioinforma)cs Online Course:IBT


Introduc)on to Databases and Resources | Shaun Aron
Learning Outcomes
• Differen)ate between the different protein
classifica)on methods
• Use the appropriate tools to annotate a
protein sequence of interest
• Access and retrieve informa)on of interest
from protein resources

Introduc)on to Bioinforma)cs Online Course:IBT


Introduc)on to Databases and Resources | Shaun Aron
Central Dogma

www.khanacademy.org
Introduc)on to Bioinforma)cs Online Course:IBT
Introduc)on to Databases and Resources | Shaun Aron
Protein Resources
• A variety of protein
resources online
• Several websites/
resources dedicated to
providing a single
interface to mul)ple
resources
• Important to
differen)ate between
databases and
resources
Introduc)on to Bioinforma)cs Online Course:IBT
Introduc)on to Databases and Resources | Shaun Aron
Protein Databases
• Sequence and informa)on databases
üNCBI Protein Database – contains protein
sequences from GenBank, RefSeq , as well as
records from SwissProt, PIR, PRF, and PDB
üEBI - UniProtKB – the “Protein knowledgebase”, a
comprehensive set of protein sequences. Func)onal
informa)on on proteins, with accurate, consistent, and
rich annota)on, the amino acid sequence, protein name or
descrip)on, taxonomic data and cita)on informa)on.
Divided into two parts: Swiss-Prot and TrEMBL

Introduc)on to Bioinforma)cs Online Course:IBT


Introduc)on to Databases and Resources | Shaun Aron
Protein Databases

Introduc)on to Bioinforma)cs Online Course:IBT


Introduc)on to Databases and Resources | Shaun Aron
Protein Classifica)on Concepts
• Classifica)on methods group proteins based
on:
üSequence similarity
üStructural similarity
• Most groups already contain a set of well
characterised proteins whose func)on is
known

Introduc)on to Bioinforma)cs Online Course:IBT


Introduc)on to Databases and Resources | Shaun Aron
Protein Classifica)on Concepts
• Proteins can be classified into different groups
based on:
üThe families to which they belong
üThe domains they contain
üThe sequence features they possess
• Protein families share a common evolu)onary
origin, based on their related func)ons an
similari)es in sequence or structure
Introduc)on to Bioinforma)cs Online Course:IBT
Introduc)on to Databases and Resources | Shaun Aron
Protein Classifica)on

Introduc)on to Bioinforma)cs Online Course:IBT


Introduc)on to Databases and Resources | Shaun Aron
Protein Classifica)on
• Superfamily
ü A large group of
distantly related
proteins

Introduc)on to Bioinforma)cs Online Course:IBT


Introduc)on to Databases and Resources | Shaun Aron
Protein Classifica)on
• Family
ü Group of evolu)onarily
related proteins that
share one or more
domains/repeats

Introduc)on to Bioinforma)cs Online Course:IBT


Introduc)on to Databases and Resources | Shaun Aron
Protein Classifica)on
• Subfamily
ü A small group of closely
related proteins

Introduc)on to Bioinforma)cs Online Course:IBT


Introduc)on to Databases and Resources | Shaun Aron
Protein Domains
• Domain
ü Discrete structural unit
that is assumed to fold
independently of the
rest of the protein and
to have its own func)on.
ü It can be composed of
20 – 100s of amino acid
residues.
ü Similar domains can be
found in proteins with
different func)ons
Introduc)on to Bioinforma)cs Online Course:IBT
Introduc)on to Databases and Resources | Shaun Aron
Protein Sequence Features
• Mo)fs
üShort conserved regions and frequently are the
most conserved regions of a domain. Mo)fs are
cri)cal for the domain to func)on – in enzymes,
for example, the contain the ac)ve sites

Introduc)on to Bioinforma)cs Online Course:IBT


Introduc)on to Databases and Resources | Shaun Aron
Protein Sequence Features
• Repeat
ü Stretch of amino acid
sequence that gets
repeated a number of
)mes along the length of
the sequence. Many
domains are cons)tuted
from repeats
ü Repeats may contain
binding sites and
contribute to structural
proper)es of the protein

Introduc)on to Bioinforma)cs Online Course:IBT


Introduc)on to Databases and Resources | Shaun Aron
Protein Sequence Features
• Consensus site/post-transla)on modifica)on
site (PTM)
üA conserved posi)on(s) among homologous
sequences. Posi)on can be theore)cally modified,
for example, by phosphoryla)on or glycosyla)on.
An asparagine followed by any amino acid
followed by serine or threonine, for example, is a
consensus site for N-linked glycosyla)on

Introduc)on to Bioinforma)cs Online Course:IBT


Introduc)on to Databases and Resources | Shaun Aron
Protein Signatures
• Protein signature are computa)onal models used to
classify protein proper)es:
ü Protein families
ü Domains
ü Conserved sites
ü Protein sequence features
• Built from mul)ple sequence alignments (MSA) of
proteins
ü Proteins belonging to the same family or sharing a domain
ü Predic)ve model built
ü Trained on new data
ü Used for protein sequence analysis
Introduc)on to Bioinforma)cs Online Course:IBT
Introduc)on to Databases and Resources | Shaun Aron
Protein Signature Models

Introduc)on to Bioinforma)cs Online Course:IBT


Introduc)on to Databases and Resources | Shaun Aron
Types of Protein Signatures
• Pa_ern
üFunc)onal sites such as binding/ac)ve sites
usually consist of a few conserved amino acids
üThese conserved pa_erns are iden)fied from
MSAs
üModeled as a short, con)guous stretch of protein
using regular expressions. E.g D[DE]X is a pa_ern
composed of amino acid D, followed by either D
or E, followed by any amino acid
Introduc)on to Bioinforma)cs Online Course:IBT
Introduc)on to Databases and Resources | Shaun Aron
Types of Protein Signatures
• Pa_ern: describes a short, con)guous stretch
of protein using regular expressions. E.g
DX[DE]X is a pa_ern composed of amino acid
D, followed by either D or E, followed by any
amino acid

Introduc)on to Bioinforma)cs Online Course:IBT


Introduc)on to Databases and Resources | Shaun Aron
Types of Protein Signatures
• Profile
üUsed to model protein families and domains
üA profile is built from MSAs and is a matrix or
table that describes the probability of finding a
par)cular amino acid at at certain posi)on.
üThe matrix is generated based on the frequency at
which an amino acid occurs at each posi)on.
üHidden Markov Models (HMMs) can be used to
create a more powerful sta)s)cal profile from
MSAs
Introduc)on to Bioinforma)cs Online Course:IBT
Introduc)on to Databases and Resources | Shaun Aron
Types of Protein Signatures

Introduc)on to Bioinforma)cs Online Course:IBT


Introduc)on to Databases and Resources | Shaun Aron
Types of Protein Signatures
• Fingerprints
üUsed to iden)fy several conserved mo)fs
üMul)ple short conserved mo)fs, are drawn from
sequence alignments.
üEach mo)f is converted into an individual profile
to create a fingerprint signature.
üUseful for iden)fying small differences between
closely related proteins.

Introduc)on to Bioinforma)cs Online Course:IBT


Introduc)on to Databases and Resources | Shaun Aron
Types of Protein Signatures

Introduc)on to Bioinforma)cs Online Course:IBT


Introduc)on to Databases and Resources | Shaun Aron
PROTEIN RESOURCES

Introduc)on to Bioinforma)cs Online Course:IBT


Introduc)on to Databases and Resources | Shaun Aron
Pfam
• Collec)on of protein families and domains
• Represented by
üMul)ple sequence alignments
üHidden Markov Models (HMMs)

Introduc)on to Bioinforma)cs Online Course:IBT


Introduc)on to Databases and Resources | Shaun Aron
Pfam
• Two components to Pfam:
– Pfam-A entries: High quality, manually curated
families
– Pfam-B entries: Automa)cally generated
• Genera)on of higher-level groupings of
related families, known as clans (collec)on of
Pfam-A entries which are related by similarity
of sequence, structure or profile-HMM
• h_p://pfam.xfam.org
Introduc)on to Bioinforma)cs Online Course:IBT
Introduc)on to Databases and Resources | Shaun Aron
SMART
• Simple Modular Architecture Research Tool
üIden)fica)on and annota)on of protein domains
üAnalysis of protein domain architectures
üManually curated models for the predic)on of
protein domains
üh_p://smart.embl-heidelberg.de

Introduc)on to Bioinforma)cs Online Course:IBT


Introduc)on to Databases and Resources | Shaun Aron
PRINTS
• Collec)on of protein family
“fingerprints” (group of conserved mo)fs used
to characterise a protein family)
• Predic)on of func)onal families in
uncharacterised protein sequences
• h_p://www.bioinf.manchester.ac.uk/
dbbrowser/PRINTS/index.php

Introduc)on to Bioinforma)cs Online Course:IBT


Introduc)on to Databases and Resources | Shaun Aron
ExPASY (h_ps://www.expasy.org/)
• Expasy (Swiss Ins)tute of Bioinforma)cs)
üUniProt, PROSITE, homology modelling, docking,
many many other tools doing protein sequences
and iden)ca)on, mass spectrometry and 2-DE
data, protein characterisa)on and func)on
families, pa_erns and profiles, post-transla)onal
modica)on, protein structure, protein-protein
interac)on, similarity search/alignment, drug
design, molecular modelling

Introduc)on to Bioinforma)cs Online Course:IBT


Introduc)on to Databases and Resources | Shaun Aron
Protein Informa)on Resource
• PIR
üProtein ontology
üProClass: Reports for UniProtKB
üProLink: Literature, Text Mining
üh_p://pir.georgetown.edu/

Introduc)on to Bioinforma)cs Online Course:IBT


Introduc)on to Databases and Resources | Shaun Aron
InterPro
• Designed to integrate signature databases
üProtein families, domain and func)onal sites
üh_p://www.ebi.ac.uk/interpro/

Introduc)on to Bioinforma)cs Online Course:IBT


Introduc)on to Databases and Resources | Shaun Aron
InterPro

Introduc)on to Bioinforma)cs Online Course:IBT


Introduc)on to Databases and Resources | Shaun Aron
InterPro
• Signatures describing the same protein family,
domain or func)onal site grouped into a single
InterPro iden)fier
• InterProScan tool
ü Integrate signature recogni)on methods into a single
applica)on
ü Find signatures that match a protein sequence of
interest
ü Web-based version of InterProScan
ü h_p://www.ebi.ac.uk/interpro/
Introduc)on to Bioinforma)cs Online Course:IBT
Introduc)on to Databases and Resources | Shaun Aron
Uniprot – Example Pax-6 protein

Introduc)on to Bioinforma)cs Online Course:IBT


Introduc)on to Databases and Resources | Shaun Aron
Uniprot – Example Pax-6 protein

Introduc)on to Bioinforma)cs Online Course:IBT


Introduc)on to Databases and Resources | Shaun Aron
Uniprot – Example Pax-6 protein

Introduc)on to Bioinforma)cs Online Course:IBT


Introduc)on to Databases and Resources | Shaun Aron
Uniprot – Example Pax-6 protein

Introduc)on to Bioinforma)cs Online Course:IBT


Introduc)on to Databases and Resources | Shaun Aron
Uniprot – Example Pax-6 protein

Introduc)on to Bioinforma)cs Online Course:IBT


Introduc)on to Databases and Resources | Shaun Aron
Uniprot – Example Pax-6 protein

Introduc)on to Bioinforma)cs Online Course:IBT


Introduc)on to Databases and Resources | Shaun Aron
Uniprot – Example Pax-6 protein

Introduc)on to Bioinforma)cs Online Course:IBT


Introduc)on to Databases and Resources | Shaun Aron
Uniprot – Example Pax-6 protein

Introduc)on to Bioinforma)cs Online Course:IBT


Introduc)on to Databases and Resources | Shaun Aron
Uniprot – Example Pax-6 protein

Introduc)on to Bioinforma)cs Online Course:IBT


Introduc)on to Databases and Resources | Shaun Aron
Uniprot – Example Pax-6 protein

Introduc)on to Bioinforma)cs Online Course:IBT


Introduc)on to Databases and Resources | Shaun Aron
Uniprot – Example Pax-6 protein

Introduc)on to Bioinforma)cs Online Course:IBT


Introduc)on to Databases and Resources | Shaun Aron
Uniprot – Example Pax-6 protein

Introduc)on to Bioinforma)cs Online Course:IBT


Introduc)on to Databases and Resources | Shaun Aron
Uniprot – Example Pax-6 protein

Introduc)on to Bioinforma)cs Online Course:IBT


Introduc)on to Databases and Resources | Shaun Aron

You might also like