Professional Documents
Culture Documents
Structural Classification of Proteins Database
Structural Classification of Proteins Database
Classes
The broadest groups on SCOP version 1.75 are the protein fold classes. These classes group structures with
similar secondary structure composition, but different overall tertiary structures and evolutionarily origins.
This is the top level "root" of the SCOP hierarchical classification.
The number in brackets, called a "sunid", is a SCOP unique integer identifier for each node in the SCOP
hierarchy. The number in parentheses indicates how many elements are in each category. For example,
there are 284 folds in the "All alpha proteins" class. Each member of the hierarchy is a link to the next level
of the hierarchy.
Folds
Each class contains a number of distinct folds. This classification level indicates similar tertiary structure,
but not necessarily evolutionary relatedness. For example, the "All-α proteins" class contains >280 distinct
folds, including: Globin-like (core: 6 helices; folded leaf, partly opened), long alpha-hairpin (2 helices;
antiparallel hairpin, left-handed twist) and Type I dockerin domains (tandem repeat of two calcium-binding
loop-helix motifs, distinct from the EF-hand).
Superfamilies
Domains within a fold are further classified into superfamilies. This is a largest grouping of proteins for
which structural similarity is sufficient to indicate evolutionary relatedness and therefore share a common
ancestor. However, this ancestor is presumed to be distant, because the different members of a superfamily
have low sequence identities. For example, the two superfamilies of the "Globin-like" fold are: the Globin
superfamily and alpha-helical ferredoxin superfamily (contains two Fe4-S4 clusters).
Families
Protein families are more closely related than superfamilies. Domains are placed in the same family if that
have either:
The similarity in sequence and structure is evidence that these proteins have a closer evolutionary
relationship than do proteins in the same superfamily. Sequence tools, such as BLAST, are used to assist in
placing domains into superfamilies and families. For example, the four families in the "globin-like"
superfamily of the "globin-like" fold are truncated hemoglobin (lack the first helix), nerve tissue mini-
hemoglobin (lack the first helix but otherwise is more similar to conventional globins than the truncated
ones), globins (Heme-binding protein), and phycocyanin-like phycobilisome proteins (oligomers of two
different types of globin-like subunits containing two extra helices at the N-terminus binds a bilin
chromophore). Families in SCOP are each assigned a concise classification string, sccs, where the letter
identifies the class to which the domain belongs; the following integers identify the fold, superfamily, and
family, respectively (e.g., a.1.1.2 for the "Globin" family).[10]
Example
Most pages in SCOP contain a search box. Entering "trypsin +human" retrieves several proteins, including
the protein trypsinogen from humans. Selecting that entry displays a page that includes the "lineage", which
is at the top of most SCOP pages.
1. Root: scop
2. Class: All beta proteins [48724]
3. Fold: Trypsin-like serine proteases [50493]
Searching for "Subtilisin" returns the protein, "Subtilisin from Bacillus subtilis, carlsberg", with the
following lineage.
1. Root: scop
2. Class: Alpha and beta proteins (a/b) [51349]
SCOP successors
By 2009, the original SCOP database manually classified 38,000 PDB entries into a strictly hierarchical
structure. With the accelerating pace of protein structure publications, the limited automation of
classification could not keep up, leading to a non-comprehensive dataset. The Structural Classification of
Proteins extended (SCOPe) database was released in 2012 with far greater automation of the same
hierarchical system and is full backwards compatible with SCOP version 1.75. In 2014, manual curation
was reintroduced into SCOPe to maintain accurate structure assignment. As of February 2015, SCOPe 2.05
classified 71,000 of the 110,000 total PDB entries.[11]
SCOP2 prototype was a beta version of Structural classification of proteins and classification system that
aimed to more the evolutionary complexity inherent in protein structure evolution.[12] It is therefore not a
simple hierarchy, but a directed acyclic graph network connecting protein superfamilies representing
structural and evolutionary relationships such as circular permutations, domain fusion and domain decay.
Consequently, domains are not separated by strict fixed boundaries, but rather are defined by their
relationships to the most similar other structures. The prototype was used for the development of the SCOP
version 2 database.[7] The SCOP version 2, release January 2020, contains 5134 families and 2485
superfamilies compared to 3902 families and 1962 superfamilies in SCOP 1.75. The classification levels
organise more than 41 000 non-redundant domains that represent more than 504 000 protein structures.
The Evolutionary Classification of Protein Domains (ECOD) database released in 2014 is a similar to
SCOPe expansion of SCOP version 1.75. Unlike the compatible SCOPe, it renames the class-fold-
superfamily-family hierarchy into an architecture-X-homology-topology-family (A-XHTF) grouping, with
the last level mostly defined by Pfam and supplemented by HHsearch clustering for uncategorized
sequences.[13] ECOD has the best PDB coverage of all three successors: it covers every PDB structure,
and is updated biweekly.[14] The direct mapping to Pfam has proven useful to Pfam curators who use the
homology-level category to supplement their "clan" grouping.[15]
See also
Structural alignment
CATH
FSSP
SUPERFAMILY
Pfam
References
1. Andreeva A, Howorth D, Chandonia JM, Brenner SE, Hubbard TJ, Chothia C, Murzin AG
(January 2008). "Data growth and its impact on the SCOP database: new developments" (htt
ps://www.ncbi.nlm.nih.gov/pmc/articles/PMC2238974). Nucleic Acids Research. 36
(Database issue): D419-25. doi:10.1093/nar/gkm993 (https://doi.org/10.1093%2Fnar%2Fgk
m993). PMC 2238974 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2238974).
PMID 18000004 (https://pubmed.ncbi.nlm.nih.gov/18000004).
2. Chandonia JM, Fox NK, Brenner SE (January 2019). "SCOPe: classification of large
macromolecular structures in the structural classification of proteins-extended database" (htt
ps://www.ncbi.nlm.nih.gov/pmc/articles/PMC6323910). Nucleic Acids Research. 47 (D1):
D475–D481. doi:10.1093/nar/gky1134 (https://doi.org/10.1093%2Fnar%2Fgky1134).
PMC 6323910 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6323910). PMID 30500919
(https://pubmed.ncbi.nlm.nih.gov/30500919).
3. Murzin AG, Brenner SE, Hubbard T, Chothia C (April 1995). "SCOP: a structural
classification of proteins database for the investigation of sequences and structures".
Journal of Molecular Biology. 247 (4): 536–40. doi:10.1016/S0022-2836(05)80134-2 (https://
doi.org/10.1016%2FS0022-2836%2805%2980134-2). PMID 7723011 (https://pubmed.ncbi.
nlm.nih.gov/7723011).
4. Hubbard TJ, Ailey B, Brenner SE, Murzin AG, Chothia C (January 1999). "SCOP: a
Structural Classification of Proteins database" (https://www.ncbi.nlm.nih.gov/pmc/articles/PM
C148149). Nucleic Acids Research. 27 (1): 254–6. doi:10.1093/nar/27.1.254 (https://doi.org/
10.1093%2Fnar%2F27.1.254). PMC 148149 (https://www.ncbi.nlm.nih.gov/pmc/articles/PM
C148149). PMID 9847194 (https://pubmed.ncbi.nlm.nih.gov/9847194).
5. Lo Conte L, Ailey B, Hubbard TJ, Brenner SE, Murzin AG, Chothia C (January 2000).
"SCOP: a structural classification of proteins database" (https://www.ncbi.nlm.nih.gov/pmc/ar
ticles/PMC102479). Nucleic Acids Research. 28 (1): 257–9. doi:10.1093/nar/28.1.257 (http
s://doi.org/10.1093%2Fnar%2F28.1.257). PMC 102479 (https://www.ncbi.nlm.nih.gov/pmc/ar
ticles/PMC102479). PMID 10592240 (https://pubmed.ncbi.nlm.nih.gov/10592240).
6. Andreeva A, Howorth D, Brenner SE, Hubbard TJ, Chothia C, Murzin AG (January 2004).
"SCOP database in 2004: refinements integrate structure and sequence family data" (https://
www.ncbi.nlm.nih.gov/pmc/articles/PMC308773). Nucleic Acids Research. 32 (Database
issue): D226-9. doi:10.1093/nar/gkh039 (https://doi.org/10.1093%2Fnar%2Fgkh039).
PMC 308773 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC308773). PMID 14681400 (http
s://pubmed.ncbi.nlm.nih.gov/14681400).
7. Andreeva A, Kulesha E, Gough J, Murzin AG (January 2020). "SCOP database in 2020: :
expanded classification of representative family and superfamily domains of known protein
structures" (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7139981). Nucleic Acids
Research. 48 (Database issue): D376–D382. doi:10.1093/nar/gkz1064 (https://doi.org/10.10
93%2Fnar%2Fgkz1064). PMC 7139981 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC713
9981). PMID 31724711 (https://pubmed.ncbi.nlm.nih.gov/31724711).
8. Murzin AG, Brenner SE, Hubbard T, Chothia C (April 1995). "SCOP: a structural
classification of proteins database for the investigation of sequences and structures" (https://
web.archive.org/web/20120426170732/http://scop.mrc-lmb.cam.ac.uk/scop/ref/1995-jmb-sco
p.pdf) (PDF). Journal of Molecular Biology. 247 (4): 536–40. doi:10.1016/S0022-
2836(05)80134-2 (https://doi.org/10.1016%2FS0022-2836%2805%2980134-2).
PMID 7723011 (https://pubmed.ncbi.nlm.nih.gov/7723011). Archived from the original (http://
scop.mrc-lmb.cam.ac.uk/scop/ref/1995-jmb-scop.pdf) (PDF) on 2012-04-26.
9. PDB: 2DN1 (https://www.rcsb.org/structure/2DN1); Park SY, Yokoyama T, Shibayama N,
Shiro Y, Tame JR (July 2006). "1.25 A resolution crystal structures of human haemoglobin in
the oxy, deoxy and carbonmonoxy forms". Journal of Molecular Biology. 360 (3): 690–701.
doi:10.1016/j.jmb.2006.05.036 (https://doi.org/10.1016%2Fj.jmb.2006.05.036).
PMID 16765986 (https://pubmed.ncbi.nlm.nih.gov/16765986).
10. Lo Conte L, Brenner SE, Hubbard TJ, Chothia C, Murzin AG (January 2002). "SCOP
database in 2002: refinements accommodate structural genomics" (https://www.ncbi.nlm.nih.
gov/pmc/articles/PMC99154). Nucleic Acids Research. 30 (1): 264–7.
doi:10.1093/nar/30.1.264 (https://doi.org/10.1093%2Fnar%2F30.1.264). PMC 99154 (https://
www.ncbi.nlm.nih.gov/pmc/articles/PMC99154). PMID 11752311 (https://pubmed.ncbi.nlm.n
ih.gov/11752311).
11. "What is the relationship between SCOP, SCOPe, and SCOP2" (http://scop.berkeley.edu/hel
p/ver=2.05#scopchanges). scop.berkeley.edu. Retrieved 2015-08-22.
12. Andreeva A, Howorth D, Chothia C, Kulesha E, Murzin AG (January 2014). "SCOP2
prototype: a new approach to protein structure mining" (https://www.ncbi.nlm.nih.gov/pmc/arti
cles/PMC3964979). Nucleic Acids Research. 42 (Database issue): D310-4.
doi:10.1093/nar/gkt1242 (https://doi.org/10.1093%2Fnar%2Fgkt1242). PMC 3964979 (http
s://www.ncbi.nlm.nih.gov/pmc/articles/PMC3964979). PMID 24293656 (https://pubmed.ncbi.
nlm.nih.gov/24293656).
13. Cheng H, Schaeffer RD, Liao Y, Kinch LN, Pei J, Shi S, Kim BH, Grishin NV (December
2014). "ECOD: an evolutionary classification of protein domains" (https://www.ncbi.nlm.nih.g
ov/pmc/articles/PMC4256011). PLOS Computational Biology. 10 (12): e1003926.
Bibcode:2014PLSCB..10E3926C (https://ui.adsabs.harvard.edu/abs/2014PLSCB..10E3926
C). doi:10.1371/journal.pcbi.1003926 (https://doi.org/10.1371%2Fjournal.pcbi.1003926).
PMC 4256011 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4256011). PMID 25474468
(https://pubmed.ncbi.nlm.nih.gov/25474468).
14. "Evolutionary Classification of Protein Domains" (http://prodata.swmed.edu/ecod/).
prodata.swmed.edu. Retrieved 18 May 2019.
15. El-Gebali S, Mistry J, Bateman A, Eddy SR, Luciani A, Potter SC, Qureshi M, Richardson LJ,
Salazar GA, Smart A, Sonnhammer EL, Hirsh L, Paladin L, Piovesan D, Tosatto SC, Finn
RD (January 2019). "The Pfam protein families database in 2019" (https://www.ncbi.nlm.nih.
gov/pmc/articles/PMC6324024). Nucleic Acids Research. 47 (D1): D427–D432.
doi:10.1093/nar/gky995 (https://doi.org/10.1093%2Fnar%2Fgky995). PMC 6324024 (https://
www.ncbi.nlm.nih.gov/pmc/articles/PMC6324024). PMID 30357350 (https://pubmed.ncbi.nl
m.nih.gov/30357350).
External links
Structural Classification of Proteins (http://scop.mrc-lmb.cam.ac.uk/) (SCOP 2) - Manual
classification of representative domains, regularly updated by the SCOP authors
Structural Classification of Proteins (http://scop.mrc-lmb.cam.ac.uk/legacy/) (SCOP 1.75) -
Legacy SCOP 1.75 site, no longer updated
Structural Classification of Proteins extended (http://scop.berkeley.edu/) (SCOPe) - The
more automated successor of SCOP version 1.75
Evolutionary Classification of Protein Domains (http://prodata.swmed.edu/ecod/) (ECOD) -
Evolutionary classification based on SCOP version 1.75 and Pfam
Structural Classification of Proteins 2 (http://scop.mrc-lmb.cam.ac.uk/scop2/) (SCOP2
prototype) - Legacy site of the SCOP 2 prototype, no longer updated
SUPERFAMILY (http://supfam.org/SUPERFAMILY/) - Library of HMMs representing SCOP
superfamilies and database of (superfamily and family) annotations for all completely
sequenced organisms
Protein Structure Classification (http://media.wiley.com/product_data/excerpt/85/04717793/0
471779385.pdf) - a book chapter that discusses different protein classifications in detail.