Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 12

Topic Name – Protein Sequence Databases

Protein Information Resource(PIR)

Uniprot - Protein Knowledge Database


PROTEIN/PROTEOMICS
DATABASES
Pfam - Protein Family And Domain

Prosite - Protein Family And Domain


• The Swiss-Prot, TrEMBL, and PIR protein
database activities have united to form the
Universal Protein Resource (UniProt)
– Uniprot Knowledgebase (UniprotKB):
curated Sequence information,
annotations, linked to other

UNIPROT
databases.
– Uniprot Reference Clusters (UniRef):
removing sequence redundancy by

Database merging sequences that are 100%,


90% and 50%, no annotations, linked
to Knowledgebase and UniParc
records.
– Uniprot Archive (UniParc): history of
sequences, no annotation, linked to
source records.
UNIPROT SEQUENCE DATABASES

UniProt Archive (UniParc) UniProt Reference (UniRef)


Stable, comprehensive, non-redundant Three non-redundant collections based
collection of all protein sequences ever on sequence similarity clusters
published • UniRef100 has all identical and
Merged from PIR, SwissProt, TREMBL, identical overlapping subsequences
DDBJ/EMBL/GenBank proteins and merged into one entry in UniRef100
proteomes, PDB, International Protein • UniRef90 merges all protein sequence
Index, RefSeq translations and other clusters with 90% sequence identity
organism proteomes not yet in into a single entry.
DDBJ/EMBL/GenBank • UniRef50 merges all protein sequence
clusters with 50% sequence identity
into a single entry
UniProt Sequence Databases (cont.)
•UniProt Archive (UniProt)
• UniProt/SwissProt
• Manually curated highly-annotated sequences from SwissProt & PIRSF
including descriptions, taxonomy, citations, GO terms, motifs, functional
and structural classifications, residue specific annotations including
variations.
• Some automatic rule-based annotations including InterPro domains and
motifs, PROSITE, PRINTS, Prodom, SMART, PFAM, PIRSF, Superfamily and
TIGRFAMS classifications.
• UniProt/TREMBL
• Automatically translated from genomes including predicted as well as
RefSeq genes.
• Automated rule-based annotations.
• PIR was established in 1984 by the
National Biomedical Research
Foundation (NBRF) as a resource to
assist researchers in the identification
PROTEIN and interpretation of protein sequence
INFORMATION information.
• The Protein Information Resource (PIR)
RESOURCE is an integrated public bioinformatics
resource to support genomic,
proteomic and systems biology
research and scientific studies
PFAM

PFAM IS A DATABASE OF CURATED PROTEIN FAMILIES, IN PFAM, THE PROFILE HMM IS SEARCHED AGAINST A
EACH OF WHICH IS DEFINED BY TWO ALIGNMENTS AND A LARGE SEQUENCE COLLECTION, BASED ON UNIPROT
PROFILE HIDDEN MARKOV MODEL (HMM). KNOWLEDGEBASE (UNIPROTKB), TO FIND ALL INSTANCES
OF THE FAMILY.
PROSITE DATABASE

PROSITE is a database of protein families and domains. It is based


on the observation that, while there is a huge number of different
proteins, most of them can be grouped, on the basis of similarities
in their sequences, into a limited number of families.

Proteins or protein domains belonging to a particular family


generally share functional attributes and are derived from a
common ancestor.
PROSITE DATABASE

You might also like