Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Structural Classification of Proteins database

The Structural Classification of Proteins


(SCOP) database is a largely manual classification
SCOP
of protein structural domains based on similarities
of their structures and amino acid sequences. A
motivation for this classification is to determine the
evolutionary relationship between proteins.
Proteins with the same shapes but having little
sequence or functional similarity are placed in
different superfamilies, and are assumed to have
only a very distant common ancestor. Proteins
having the same shape and some similarity of Content
sequence and/or function are placed in "families", Description Protein Structure Classification
and are assumed to have a closer common ancestor.
Contact
Similar to CATH and Pfam databases, SCOP Research center Laboratory of Molecular Biology
provides a classification of individual structural Authors Alexey G. Murzin, Steven E.
domains of proteins, rather than a classification of
Brenner, Tim J. P. Hubbard, and
the entire proteins which may include a significant
Cyrus Chothia
number of different domains.
Primary citation PMID 7723011 (https://pubmed.
The SCOP database is freely accessible on the ncbi.nlm.nih.gov/7723011)
internet. SCOP was created in 1994 in the Centre
Release date 1994
for Protein Engineering and the Laboratory of
Molecular Biology.[3] It was maintained by Alexey Access
G. Murzin and his colleagues in the Centre for Website http://scop.mrc-
Protein Engineering until its closure in 2010 and lmb.cam.ac.uk/scop/
subsequently at the Laboratory of Molecular
Miscellaneous
Biology in Cambridge, England.[4][5][6][1]
Version 1.75 (June 2009; 110,800
The work on SCOP 1.75 has been discontinued in domains in 38,221 structures
2014. Since then SCOPe team from UC Berkeley classed as 3,902 families)[1]
has been responsible for updating the database in a
compatible manner, with a combination of Curation policy manual
automated and manual methods. As of April 2019,
the latest release is SCOPe 2.07 (March 2018).[2] SCOPe
Content
The new Structural Classification of Proteins version 2
(SCOP2) database was released at the beginning of 2020. Description SCOP - extended
The new update featured an improved database schema, a Contact
new API and modernised web interface. This was the most
Authors Naomi K. Fox, Steven E.
significant update by the Cambridge group since SCOP
Brenner, and John-Marc
1.75 and builds on the advances in schema from the SCOP
Chandonia
2 prototype.[7]
Primary citation PMID 24304899 (https://
pubmed.ncbi.nlm.nih.go
Hierarchical organisation
v/24304899)
The source of protein structures is the Protein Data Bank. Access
The unit of classification of structure in SCOP is the protein
Website https://scop.berkeley.edu
domain. What the SCOP authors mean by "domain" is
suggested by their statement that small proteins and most Miscellaneous
medium-sized ones have just one domain,[8] and by the Version 2.07 (March 2018;
observation that human hemoglobin,[9] which has an α2 β2 276,231 domains in
structure, is assigned two SCOP domains, one for the α and 87,224 structures
one for the β subunit. classed as 4,919
families)[2]
The shapes of domains are called "folds" in SCOP.
Domains belonging to the same fold have the same major Curation policy manual (new
secondary structures in the same arrangement with the classifications) and
same topological connections. 1195 folds are given in automated (new
SCOP version 1.75. Short descriptions of each fold are structures, BLAST)
given. For example, the "globin-like" fold is described as
core: 6 helices; folded leaf, partly opened. The fold to which a domain belongs is determined by inspection,
rather than by software.

The levels of SCOP version 1.75 are as follows.

1. Class: Types of folds, e.g., beta sheets.


2. Fold: The different shapes of domains within a class.
3. Superfamily: The domains in a fold are grouped into superfamilies, which have at least a
distant common ancestor.
4. Family: The domains in a superfamily are grouped into families, which have a more recent
common ancestor.
5. Protein domain: The domains in families are grouped into protein domains, which are
essentially the same protein.
6. Species: The domains in "protein domains" are grouped according to species.
7. Domain: part of a protein. For simple proteins, it can be the entire protein.

Classes

The broadest groups on SCOP version 1.75 are the protein fold classes. These classes group structures with
similar secondary structure composition, but different overall tertiary structures and evolutionarily origins.
This is the top level "root" of the SCOP hierarchical classification.

1. All alpha proteins [46456] (284): Domains consisting of α-helices


2. All beta proteins [48724] (174): Domains consisting of β-sheets
3. Alpha and beta proteins (a/b) [51349] (147): Mainly parallel beta sheets (beta-alpha-beta
units)
4. Alpha and beta proteins (a+b) [53931] (376): Mainly antiparallel beta sheets (segregated
alpha and beta regions)
5. Multi-domain proteins (alpha and beta) [56572] (66): Folds consisting of two or more
domains belonging to different classes
6. membrane and cell surface proteins and peptides [56835] (58): Does not include proteins in
the immune system
7. Small proteins [56992] (90): Usually dominated by metal ligand, cofactor, and/or disulfide
bridges
8. coiled-coil proteins [57942] (7): Not a true class
9. Low resolution protein structures [58117] (26): Peptides and fragments. Not a true class
10. Peptides [58231] (121): peptides and fragments. Not a true class.
11. Designed proteins [58788] (44): Experimental structures of proteins with essentially non-
natural sequences. Not a true class

The number in brackets, called a "sunid", is a SCOP unique integer identifier for each node in the SCOP
hierarchy. The number in parentheses indicates how many elements are in each category. For example,
there are 284 folds in the "All alpha proteins" class. Each member of the hierarchy is a link to the next level
of the hierarchy.

Folds

Each class contains a number of distinct folds. This classification level indicates similar tertiary structure,
but not necessarily evolutionary relatedness. For example, the "All-α proteins" class contains >280 distinct
folds, including: Globin-like (core: 6 helices; folded leaf, partly opened), long alpha-hairpin (2 helices;
antiparallel hairpin, left-handed twist) and Type I dockerin domains (tandem repeat of two calcium-binding
loop-helix motifs, distinct from the EF-hand).

Superfamilies

Domains within a fold are further classified into superfamilies. This is a largest grouping of proteins for
which structural similarity is sufficient to indicate evolutionary relatedness and therefore share a common
ancestor. However, this ancestor is presumed to be distant, because the different members of a superfamily
have low sequence identities. For example, the two superfamilies of the "Globin-like" fold are: the Globin
superfamily and alpha-helical ferredoxin superfamily (contains two Fe4-S4 clusters).

Families

Protein families are more closely related than superfamilies. Domains are placed in the same family if that
have either:

1. >30% sequence identity


2. some sequence identity (e.g., 15%) and perform the same function

The similarity in sequence and structure is evidence that these proteins have a closer evolutionary
relationship than do proteins in the same superfamily. Sequence tools, such as BLAST, are used to assist in
placing domains into superfamilies and families. For example, the four families in the "globin-like"
superfamily of the "globin-like" fold are truncated hemoglobin (lack the first helix), nerve tissue mini-
hemoglobin (lack the first helix but otherwise is more similar to conventional globins than the truncated
ones), globins (Heme-binding protein), and phycocyanin-like phycobilisome proteins (oligomers of two
different types of globin-like subunits containing two extra helices at the N-terminus binds a bilin
chromophore). Families in SCOP are each assigned a concise classification string, sccs, where the letter
identifies the class to which the domain belongs; the following integers identify the fold, superfamily, and
family, respectively (e.g., a.1.1.2 for the "Globin" family).[10]

PDB entry domains


A "TaxId" is the taxonomy ID number and links to the NCBI taxonomy browser, which provides more
information about the species to which the protein belongs. Clicking on a species or isoform brings up a list
of domains. For example, the "Hemoglobin, alpha-chain from Human (Homo sapiens)" protein has >190
solved protein structures, such as 2dn3 (complexed with cmo), and 2dn1 (complexed with hem, mbn, oxy).
Clicking on the PDB numbers is supposed to display the structure of the molecule, but the links are
currently broken (links work in pre-SCOP).

Example
Most pages in SCOP contain a search box. Entering "trypsin +human" retrieves several proteins, including
the protein trypsinogen from humans. Selecting that entry displays a page that includes the "lineage", which
is at the top of most SCOP pages.

Human trypsonogen lineage

1. Root: scop
2. Class: All beta proteins [48724]
3. Fold: Trypsin-like serine proteases [50493]

barrel, closed; n=6, S=8; greek-key


duplication: consists of two domains of the same fold

4. Superfamily: Trypsin-like serine proteases [50494]


5. Family: Eukaryotic proteases [50514]
6. Protein: Trypsin(ogen) [50515]
7. Species: Human (Homo sapiens) [TaxId: 9606] [50519]

Searching for "Subtilisin" returns the protein, "Subtilisin from Bacillus subtilis, carlsberg", with the
following lineage.

Subtilisin from Bacillus subtilis, carlsberg lineage

1. Root: scop
2. Class: Alpha and beta proteins (a/b) [51349]

Mainly parallel beta sheets (beta-alpha-beta units)

3. Fold: Subtilisin-like [52742]

3 layers: a/b/a, parallel beta-sheet of 7 strands, order 2314567; left-handed crossover


connection between strands 2 & 3

4. Superfamily: Subtilisin-like [52743]


5. Family: Subtilases [52744]
6. Protein: Subtilisin [52745]
7. Species: Bacillus subtilis, carlsberg [TaxId: 1423] [52746]
Although both of these proteins are proteases, they do not even belong to the same fold, which is consistent
with them being an example of convergent evolution.

Comparison to other classification systems


SCOP classification is more dependent on manual decisions than the semi-automatic classification by
CATH, its chief rival. Human expertise is used to decide whether certain proteins are evolutionary related
and therefore should be assigned to the same superfamily, or their similarity is a result of structural
constraints and therefore they belong to the same fold. Another database, FSSP, is purely automatically
generated (including regular automatic updates) but offers no classification, allowing the user to draw their
own conclusion as to the significance of structural relationships based on the pairwise comparisons of
individual protein structures.

SCOP successors

By 2009, the original SCOP database manually classified 38,000 PDB entries into a strictly hierarchical
structure. With the accelerating pace of protein structure publications, the limited automation of
classification could not keep up, leading to a non-comprehensive dataset. The Structural Classification of
Proteins extended (SCOPe) database was released in 2012 with far greater automation of the same
hierarchical system and is full backwards compatible with SCOP version 1.75. In 2014, manual curation
was reintroduced into SCOPe to maintain accurate structure assignment. As of February 2015, SCOPe 2.05
classified 71,000 of the 110,000 total PDB entries.[11]

SCOP2 prototype was a beta version of Structural classification of proteins and classification system that
aimed to more the evolutionary complexity inherent in protein structure evolution.[12] It is therefore not a
simple hierarchy, but a directed acyclic graph network connecting protein superfamilies representing
structural and evolutionary relationships such as circular permutations, domain fusion and domain decay.
Consequently, domains are not separated by strict fixed boundaries, but rather are defined by their
relationships to the most similar other structures. The prototype was used for the development of the SCOP
version 2 database.[7] The SCOP version 2, release January 2020, contains 5134 families and 2485
superfamilies compared to 3902 families and 1962 superfamilies in SCOP 1.75. The classification levels
organise more than 41 000 non-redundant domains that represent more than 504 000 protein structures.

The Evolutionary Classification of Protein Domains (ECOD) database released in 2014 is a similar to
SCOPe expansion of SCOP version 1.75. Unlike the compatible SCOPe, it renames the class-fold-
superfamily-family hierarchy into an architecture-X-homology-topology-family (A-XHTF) grouping, with
the last level mostly defined by Pfam and supplemented by HHsearch clustering for uncategorized
sequences.[13] ECOD has the best PDB coverage of all three successors: it covers every PDB structure,
and is updated biweekly.[14] The direct mapping to Pfam has proven useful to Pfam curators who use the
homology-level category to supplement their "clan" grouping.[15]

See also
Structural alignment
CATH
FSSP
SUPERFAMILY
Pfam

References
1. Andreeva A, Howorth D, Chandonia JM, Brenner SE, Hubbard TJ, Chothia C, Murzin AG
(January 2008). "Data growth and its impact on the SCOP database: new developments" (htt
ps://www.ncbi.nlm.nih.gov/pmc/articles/PMC2238974). Nucleic Acids Research. 36
(Database issue): D419-25. doi:10.1093/nar/gkm993 (https://doi.org/10.1093%2Fnar%2Fgk
m993). PMC 2238974 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2238974).
PMID 18000004 (https://pubmed.ncbi.nlm.nih.gov/18000004).
2. Chandonia JM, Fox NK, Brenner SE (January 2019). "SCOPe: classification of large
macromolecular structures in the structural classification of proteins-extended database" (htt
ps://www.ncbi.nlm.nih.gov/pmc/articles/PMC6323910). Nucleic Acids Research. 47 (D1):
D475–D481. doi:10.1093/nar/gky1134 (https://doi.org/10.1093%2Fnar%2Fgky1134).
PMC 6323910 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6323910). PMID 30500919
(https://pubmed.ncbi.nlm.nih.gov/30500919).
3. Murzin AG, Brenner SE, Hubbard T, Chothia C (April 1995). "SCOP: a structural
classification of proteins database for the investigation of sequences and structures".
Journal of Molecular Biology. 247 (4): 536–40. doi:10.1016/S0022-2836(05)80134-2 (https://
doi.org/10.1016%2FS0022-2836%2805%2980134-2). PMID 7723011 (https://pubmed.ncbi.
nlm.nih.gov/7723011).
4. Hubbard TJ, Ailey B, Brenner SE, Murzin AG, Chothia C (January 1999). "SCOP: a
Structural Classification of Proteins database" (https://www.ncbi.nlm.nih.gov/pmc/articles/PM
C148149). Nucleic Acids Research. 27 (1): 254–6. doi:10.1093/nar/27.1.254 (https://doi.org/
10.1093%2Fnar%2F27.1.254). PMC 148149 (https://www.ncbi.nlm.nih.gov/pmc/articles/PM
C148149). PMID 9847194 (https://pubmed.ncbi.nlm.nih.gov/9847194).
5. Lo Conte L, Ailey B, Hubbard TJ, Brenner SE, Murzin AG, Chothia C (January 2000).
"SCOP: a structural classification of proteins database" (https://www.ncbi.nlm.nih.gov/pmc/ar
ticles/PMC102479). Nucleic Acids Research. 28 (1): 257–9. doi:10.1093/nar/28.1.257 (http
s://doi.org/10.1093%2Fnar%2F28.1.257). PMC 102479 (https://www.ncbi.nlm.nih.gov/pmc/ar
ticles/PMC102479). PMID 10592240 (https://pubmed.ncbi.nlm.nih.gov/10592240).
6. Andreeva A, Howorth D, Brenner SE, Hubbard TJ, Chothia C, Murzin AG (January 2004).
"SCOP database in 2004: refinements integrate structure and sequence family data" (https://
www.ncbi.nlm.nih.gov/pmc/articles/PMC308773). Nucleic Acids Research. 32 (Database
issue): D226-9. doi:10.1093/nar/gkh039 (https://doi.org/10.1093%2Fnar%2Fgkh039).
PMC 308773 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC308773). PMID 14681400 (http
s://pubmed.ncbi.nlm.nih.gov/14681400).
7. Andreeva A, Kulesha E, Gough J, Murzin AG (January 2020). "SCOP database in 2020: :
expanded classification of representative family and superfamily domains of known protein
structures" (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7139981). Nucleic Acids
Research. 48 (Database issue): D376–D382. doi:10.1093/nar/gkz1064 (https://doi.org/10.10
93%2Fnar%2Fgkz1064). PMC 7139981 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC713
9981). PMID 31724711 (https://pubmed.ncbi.nlm.nih.gov/31724711).
8. Murzin AG, Brenner SE, Hubbard T, Chothia C (April 1995). "SCOP: a structural
classification of proteins database for the investigation of sequences and structures" (https://
web.archive.org/web/20120426170732/http://scop.mrc-lmb.cam.ac.uk/scop/ref/1995-jmb-sco
p.pdf) (PDF). Journal of Molecular Biology. 247 (4): 536–40. doi:10.1016/S0022-
2836(05)80134-2 (https://doi.org/10.1016%2FS0022-2836%2805%2980134-2).
PMID 7723011 (https://pubmed.ncbi.nlm.nih.gov/7723011). Archived from the original (http://
scop.mrc-lmb.cam.ac.uk/scop/ref/1995-jmb-scop.pdf) (PDF) on 2012-04-26.
9. PDB: 2DN1 (https://www.rcsb.org/structure/2DN1)​; Park SY, Yokoyama T, Shibayama N,
Shiro Y, Tame JR (July 2006). "1.25 A resolution crystal structures of human haemoglobin in
the oxy, deoxy and carbonmonoxy forms". Journal of Molecular Biology. 360 (3): 690–701.
doi:10.1016/j.jmb.2006.05.036 (https://doi.org/10.1016%2Fj.jmb.2006.05.036).
PMID 16765986 (https://pubmed.ncbi.nlm.nih.gov/16765986).
10. Lo Conte L, Brenner SE, Hubbard TJ, Chothia C, Murzin AG (January 2002). "SCOP
database in 2002: refinements accommodate structural genomics" (https://www.ncbi.nlm.nih.
gov/pmc/articles/PMC99154). Nucleic Acids Research. 30 (1): 264–7.
doi:10.1093/nar/30.1.264 (https://doi.org/10.1093%2Fnar%2F30.1.264). PMC 99154 (https://
www.ncbi.nlm.nih.gov/pmc/articles/PMC99154). PMID 11752311 (https://pubmed.ncbi.nlm.n
ih.gov/11752311).
11. "What is the relationship between SCOP, SCOPe, and SCOP2" (http://scop.berkeley.edu/hel
p/ver=2.05#scopchanges). scop.berkeley.edu. Retrieved 2015-08-22.
12. Andreeva A, Howorth D, Chothia C, Kulesha E, Murzin AG (January 2014). "SCOP2
prototype: a new approach to protein structure mining" (https://www.ncbi.nlm.nih.gov/pmc/arti
cles/PMC3964979). Nucleic Acids Research. 42 (Database issue): D310-4.
doi:10.1093/nar/gkt1242 (https://doi.org/10.1093%2Fnar%2Fgkt1242). PMC 3964979 (http
s://www.ncbi.nlm.nih.gov/pmc/articles/PMC3964979). PMID 24293656 (https://pubmed.ncbi.
nlm.nih.gov/24293656).
13. Cheng H, Schaeffer RD, Liao Y, Kinch LN, Pei J, Shi S, Kim BH, Grishin NV (December
2014). "ECOD: an evolutionary classification of protein domains" (https://www.ncbi.nlm.nih.g
ov/pmc/articles/PMC4256011). PLOS Computational Biology. 10 (12): e1003926.
Bibcode:2014PLSCB..10E3926C (https://ui.adsabs.harvard.edu/abs/2014PLSCB..10E3926
C). doi:10.1371/journal.pcbi.1003926 (https://doi.org/10.1371%2Fjournal.pcbi.1003926).
PMC 4256011 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4256011). PMID 25474468
(https://pubmed.ncbi.nlm.nih.gov/25474468).
14. "Evolutionary Classification of Protein Domains" (http://prodata.swmed.edu/ecod/).
prodata.swmed.edu. Retrieved 18 May 2019.
15. El-Gebali S, Mistry J, Bateman A, Eddy SR, Luciani A, Potter SC, Qureshi M, Richardson LJ,
Salazar GA, Smart A, Sonnhammer EL, Hirsh L, Paladin L, Piovesan D, Tosatto SC, Finn
RD (January 2019). "The Pfam protein families database in 2019" (https://www.ncbi.nlm.nih.
gov/pmc/articles/PMC6324024). Nucleic Acids Research. 47 (D1): D427–D432.
doi:10.1093/nar/gky995 (https://doi.org/10.1093%2Fnar%2Fgky995). PMC 6324024 (https://
www.ncbi.nlm.nih.gov/pmc/articles/PMC6324024). PMID 30357350 (https://pubmed.ncbi.nl
m.nih.gov/30357350).

External links
Structural Classification of Proteins (http://scop.mrc-lmb.cam.ac.uk/) (SCOP 2) - Manual
classification of representative domains, regularly updated by the SCOP authors
Structural Classification of Proteins (http://scop.mrc-lmb.cam.ac.uk/legacy/) (SCOP 1.75) -
Legacy SCOP 1.75 site, no longer updated
Structural Classification of Proteins extended (http://scop.berkeley.edu/) (SCOPe) - The
more automated successor of SCOP version 1.75
Evolutionary Classification of Protein Domains (http://prodata.swmed.edu/ecod/) (ECOD) -
Evolutionary classification based on SCOP version 1.75 and Pfam
Structural Classification of Proteins 2 (http://scop.mrc-lmb.cam.ac.uk/scop2/) (SCOP2
prototype) - Legacy site of the SCOP 2 prototype, no longer updated
SUPERFAMILY (http://supfam.org/SUPERFAMILY/) - Library of HMMs representing SCOP
superfamilies and database of (superfamily and family) annotations for all completely
sequenced organisms
Protein Structure Classification (http://media.wiley.com/product_data/excerpt/85/04717793/0
471779385.pdf) - a book chapter that discusses different protein classifications in detail.

Retrieved from "https://en.wikipedia.org/w/index.php?


title=Structural_Classification_of_Proteins_database&oldid=1166578678"

You might also like