Protein STR

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

Protein structure databases

• Contain the spatial coordinates of macromolecules whose 3D structure has


been obtained by X-ray or NMR studies

• Proteins represent more than 90% of available structures (others are DNA,
RNA, sugars, virus, complex protein/DNA…)

• PDB (Protein Data Bank), SCOP (structural classification of proteins


(according to the secondary structures)), BMRB (BioMagResBank; RMN
results)

2
Challenges

 Fast growth in
number of structures
 Growing
complexity of
depositions
 Greatly expanded
user community

Year
3
State of the PDB

• ~25,000 released structures in the PDB archive


• Over 7,000 new structures deposited in 2003
• ~200,000 file downloads per day

Depositions by Macromolecule Type:


• 92.5% Protein
• 3.5% Nucleic acid
• 4% Protein-nucleic acid complexes

Depositions by Experiment
• 85.7% X-ray
• 13.7% NMR
• 0.4% CryoEM
• 0.2% Other

4
Integrated Deposition, Processing, and
Validation

• Built on top of the mmCIF dictionary


• web interface AdIt
– Customization:
• Level of annotation (deposition/annotation)
• Experimental type (X-ray/NMR)
• Integrated validation server: maxit, AdIt, validation
• The whole PDB functionality available from the web
• data are validated by curators (syntactic correctness, scienific correctness,
e.g. checking of sequences), they use internal mmCIF with special
categories
• generating of official mmCIF and PDB files, they are uploaded to given
directories
• at the same time, mmCIF is converted into SQL query, that is sent to DB2
database system

5
How the Data Flow in and Out of PDB

2 Annotate
Validation Report
1 Deposit
PDB ID
Archival
of Data
Depositor PDB
ADIT Annotate Validate Distribution
Entry
Core Site
3 Correct DB
Corrections

4 Approve

Depositor Approval

6
• PDB ID – model – chain – residue – atom

7
Data formats

• PDB
– model, chain, residue, atom
– http://www.pdb.org/docs.html
• mmCIF
– archival format of PDB
– http://mmcif.pdb.org/
• pdbML
– blind transcription of mmCIF to XML, just to be cool and in

8
PDB Format

 a legacy format;

 incomplete and not structured enough to describe objects as


complicated as molecules;

 its limits have been broken several times;

 understood by most programs.

9
Integration of Other Resources
3D Domain CATH
Primary References Structure
Derived References Assignments SCOP

Data curation

Source SWISS-PROT/ PubMed Enzyme


Organism GenBank IDs Classification

NCBI KEGG Enzyme


Taxonomy Pathways

Gene Ontology Genomes Structural Genomics


(NCBI LocusLink) Targets

OMIM/Disease GeneCards SNPs

Re-engineered Database 10
Site Navigation

Persistent Keyword Search Box


New User Help

PDB Static
Site Search Context-
sensitive
Persistent Help
Navigation
Bar with User
Session History

11
Browse Tools
Structure data categorized and presented
through several browsers

Gene
Ontology

Enzyme
Classification

Taxonomy

Disease

Ligands

CATH/SCOP
12
PDB Archive Search Tools
SearchLite, StatusSearch & QuickSearch
SearchFields – query specific mmCIF fields
Query Review
and Refinement

Title, Deposition/
Release Dates

X-Ray & NMR


Experimental
Details

Secondary Citation,
Structure GO, EC, Source, Database
& Geometry Ligands, Sequence References

13
Query Results Page
Order Results by Resolution/Deposition Date/PDB ID
Downloads
Navigation to Query/Results
(PDB/mmCIf/XML)

Detailed Tabular
Reports

Save
Report as
Excel

Order Results by
Column Values
14
Structure Summary Page
Customized Summary for X-Ray and NMR Structures
Navigation Print PDF
‘Bread Crumbs’
to Results Page Toggle –
Asymmetric/
Downloads, Biological
Reports, Viewers, Unit Images
External links
Simple,
User-Friendly
Pubmed Abstract 3D Viewers
Search
Ligand
SCOP, CATH Viewer
and GO –Listing Ligand-
and Search Structure
Interaction
Viewer 15
New Detailed Reports

Biology & Chemistry Materials &


Methods

Structural Features

16
MMDB

• NCBI's structure database is called MMDB (Molecular


Modeling DataBase), and it is a subset from the Protein Data
Bank (PDB), excluding theoretical models.

• MMDB is a database of ASN.1-formatted records. It was


designed for flexibility, and as such, is capable of archiving
conventional structural data as well as future descriptions of
biomolecules, such as those generated by electron
microscopy (surface models).

17
Searching MMDB

• The structure database may be queried directly, using specific fields


such as author names, or text terms occurring anywhere in the
structure description.

• Alternatively you can use a PDB 4-character code or a numerical


MMDB-Id to retrieve structure summary pages directly

18
19
Molecular Visualization

• Cn3D - uses MMDB-Entrez’s structure database


http://www.ncbi.nlm.nih.gov/Structure/CN3D/cn3d.shtml
• RasMol http://www.umass.edu/microbio/rasmol/
• Protein Explorer
http://www.umass.edu/microbio/rasmol/rotating.htm
• OpenRasMol http://www.openrasmol.org/
• MolviZ.org http://www.umass.edu/microbio/chime
• World Index of Molecular Visualization
http://molvis.sdsc.edu/visres/index.html
20
Molecular Visualization

SimpleViewer built
using the Java 3D
based Molecular
Biology Toolkit
(MBT: mbt.sdsc.edu)

Ligand Interaction Viewer built


using MBT – view hydrophilic and
hydrophobic interactions between
the ligand and neighboring
residues in the structure
21

You might also like