Professional Documents
Culture Documents
Biopolymer Manual
Biopolymer Manual
Biopolymer Manual
SYBYL®-X 2.1
Mid 2013
This material contains confidential and proprietary information of Certara, L.P. and third parties furnished under the
Tripos Software License Agreement. This material may be copied only as necessary for a Licensee’s internal use
consistent with the Agreement. The allowed use includes printing of hardcopy versions hereof as minimally necessary
for Licensee’s internal use. Neither Certara, L.P., nor any person acting on its behalf, makes any warranty or
representation, expressed or implied, with respect to the accuracy, completeness, or usefulness of the material
contained in this manual or in the corresponding electronic documentation, nor in the programs or data described
herein. Certara, L.P. assumes no responsibility nor liability with respect to the use of this manual, any materials
contained herein, or programs described herein, or for any damages resulting from the use of any of the above. Except
for printing of hardcopy versions as stated, no part of this manual may be reproduced in any form or by any means
without permission in writing from Tripos (DE), Inc., 1699 South Hanley Road, Suite 200, St. Louis, Missouri 63144-
2917, USA (314-647-1099).
Selected software programs for methodologies contained or documented herein are covered by one or more of the
following patents: AllChem: US 7,860,657; Comparative Molecular Field Analysis (CoMFA): US 5,025,388; US
5,307,287; US 5,751,605; AT E150883; BE 0592421; CH 0592421; DE 691 25 300 T2; FR 0592421; GB 0592421;
IT 0592421; NL 0592421; SE 0592421. HQSAR: US 6,208,942. Embedded NLM: US 6,675,103. Topomers: US
6,185,506; US 6,240,374; US 7,184,893; US 7,212,951. TopCoMFA: US 7,329,222. DBTop: US 7,330,793. OptiSim:
US 6,535,819. Surflex software programs for chemical analysis by morphological similarity: US 6,470,305 B1.
SYBYL, UNITY, CoMFA, CombiFlexX, Concord, DiverseSolutions, GALAHAD, LeapFrog, OptDesign, StereoPlex,
and Alchemy are registered trademarks of Certara, L.P.
AUSPYX, Benchware, CScore, DISCOtech, Distill, GASP, HQSAR, Legion, MOLCAD, Molecular Spreadsheet,
Muse, OptiDock, OptiSim, Pantheon, ProTable, ProtoPlex, Selector, SiteID, Topomer CoMFA, Topomer Search,
Tuplets, and Tripos Bookshelf are trademarks of Certara, L.P.
RACHEL is a trademark of Drug Design Methodologies.
Surflex, Surflex-Dock, and Surflex-Sim are trademarks of BioPharmics LLC.
“FairCom” and “c-tree Plus” are trademarks of FairCom Corporation and are registered in the United States and other
countries.
All other trademarks are the sole property of their respective owners.
Biopolymer Table of Contents
1. Introduction to Biopolymer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.1 What is New with Biopolymer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2 License Requirements for Biopolymer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2. Biopolymer Tutorials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1 Protein Preparation Tutorial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Peptide Building Tutorial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.3 Protein Loop Search Tutorial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.4 Monomer Definition Tutorial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5. Biopolymer Display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.1 Define and Apply Protein View Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.2 Simple Biopolymer Displays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.3 Color Schemes for Biopolymers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.4 Biopolymer Ribbons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
5.5 Label Biopolymer Atoms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.6 Ramachandran Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Much of our knowledge of the structure of proteins and nucleic acids comes
from X-ray diffraction studies. The repository of this information has always
been the Protein Data Bank [Ref. 1]. SYBYL/Biopolymer includes the
capability of reading and writing the standard PDB format. Sequence data for
proteins and polynucleotides are maintained by a number of groups. SYBYL/
Biopolymer currently reads and writes the PIR format of the National
Biomedical Research Foundation [Ref. 2]. Examples of both these file formats
may be found in the TA_DEMO directory. Comprehensive reviews of
biopolymer structure can be found in the books by Schulz and Schirmer for
proteins [Ref. 3], Saenger for nucleic acids [Ref. 4], and Aspinall for polysac-
charides [Ref. 5].
Biomolecular systems tend to be complex and large in size. To study and under-
stand many aspects of these structures, it is helpful to have tools capable of
highlighting selected features of a biopolymer’s three-dimensional represen-
tation. The utility of color computer graphics in this regard has been widely
acknowledged [Ref. 6]. Visual enhancement by the use of ribbon displays and
by the capacity to formulate general and flexible coloring schemes contributes
significantly to the goal of understanding biopolymer structure.
Adding Sidechains
An issue that was causing SYBYL to become non-responsive after sidechains
were added has been resolved.
Module-Based Licensing
SYBYL continues to run with a license file issued before the SYBYL-X release.
In that context:
• A “BioPolymer” license provides access to the biopolymer functionality.
This license is also required to assign and label AMBER and Kollman
atom types and to use any of the AMBER and Kollman force fields, and
to perform a staged minimization.
• A “MOLCAD” license is required to make full use of the Protein View
dialog.
This tool allows the user to rename unrecognized atoms, repair incomplete
sidechains and the backbone, add or modify termini, add hydrogens, assign
KOLLMAN/AMBER types and charges, find the best hydrogen bonding
arrangement for sidechains containing amide groups, and fix sidechain van der
Waals overlap with the option of using rotamer libraries to set a sidechain
conformation.
Note: If you do not have Internet access, you can retrieve the file from
$TA_DEMO. File > Import File, select [$TA_DEMO], then 3dfr.pdb and
press OK.
When you retrieve a PDB file and read it for first time into SYBYL it is worth
looking at the messages in the console.
! Scroll back in the console (or increase the window’s size) and look at
the lines starting with “Adding...”
4. If you have a personal default view defined, reset the view of the protein so that
all atoms are shown and colored by atom type, and all bonds displayed as lines
! View > Protein View > Reset View
The Prepare Protein Structure dialog opens (dialog description on page 94).
The analysis takes a few seconds to complete. When it is done the fields in the
dialog are populated as shown below.
All the residues are found to have the correct atom names and the correct
number of backbone atoms when compared to the residue files in the macromol
dictionary.
The sidechain atoms are retrieved from the lysine residue file in the dictionary.
! Click the first set of chi values in the list and press Set Selected.
The sidechain in LYS51 adopts the conformation with the highest probability in
the Lovell rotamer library.
The value represents, for the residue selected at the top of the dialog, the
difference between the energy of the applied rotamer and the energy of the
Initial conformer. Energy values are computed using the Tripos force field.
! Click the Set Next buttons to scroll through the Lovell rotamers for
this residue type.
If minor steric clashes occur with other residues or water molecules, they are
indicated by yellow dashed lines. Likewise, major steric clashes are indicated
by red dashed lines.
Note: To reset the LYS51 sidechain to its original conformation, change the
Rotamer Source to Initial.
! Select the highest probability conformer in the list and press Set
Selected.
! Press Close to exit and continue with the protein preparation.
The Prepare Protein Structure dialog reappears with the Repair Sidechain line
greyed out. A partial analysis of the protein is conducted for residues with
missing hydrogens, invalid atom types, and sidechain bumps.
The analysis found that two chain termini need to be fixed or modified.
! Press Fix.
The Edit Termini dialog appears (dialog description on page 108) showing the
two terminal residues highlighted.
! Change one of the New Block menus to Charged. Note that the
other changes automatically.
! Press Apply to Selected Protein.
The Prepare Protein Structure dialog reappears with the Termini Treatment line
greyed out. A partial analysis of the protein is conducted for residues with
missing hydrogens, invalid atom types, and sidechain bumps.
The analysis reports that 426 residues are without hydrogens. One of these
residues is the cofactor, NADPH, missing its hydrogen. Also included in this
number are the co-crystallized waters.
The Add Hydrogens dialog appears (dialog description on page 99) with the
molecule already selected in the list.
! Make sure that All hydrogens will be added and that hydrogens will
be added to the water molecules in a Random orientation.
! Press OK to add the hydrogens.
The Prepare Protein Structure dialog reappears with the Add Hydrogens line
greyed out. A partial analysis of the protein is conducted for residues with
invalid atom types, and sidechain bumps.
11. Set the protonation type of a residue to favor hydrogen bonds with the ligand.
The Set Protonation Type dialog appears (dialog description on page 101) with
the molecule already selected in the list.
The list contains only the residues that are within 6 Å of any ligand or cofactor
atom and that may have more than one protonation state at near neutral pH.
The list contains four ASP residues, and the first one is selected.
! Activate Auto Center to show in the middle of the screen only the
residues in the region of interest.
! Select ASP26.
The carboxylate oxygens are close enough to the ligand to form hydrogen
bonds.
The list reflects your selection: ASP26 is in the ASZ state and the orientation of
the COOH group has been flipped.
! Close the Set Protonation Type dialog.
The analysis reports that 53 atoms do not have the proper AMBER/Kollman
atom types.
! On the Type Atoms line press Show.
All the atoms in methotrexate are highlighted. They have the correct SYBYL
atom types, but this structure does not match any of the monomers defined in
the dictionary. Therefore, its AMBER and Kollman atom types are unknown.
The missing atom types can be assigned via an SLN atom typer based on a
fragment library.
The Assign AMBER Atom Types dialog appears (dialog description on page 117)
with the molecule already selected in the list.
The Assign AMBER Atom Types dialog reports that four atoms could not be
typed.
! Click the icon so you can hide the protein part of the molecule.
The four atoms are aromatic nitrogens. Their AMBER7 FF99 atom types must
now be assigned manually. Conveniently, an atom set called
UNK_AMBER7_FF99 contains these four atoms.
! Press Manual.
Note all the “UNK” sets for the AMBER7_FF99, AMBER7 FF02,
AMBER95_ALL, and KOLL_ALL force fields.
The Assign AMBER Atom Types dialog now reports that all atoms have been
assigned AMBER7 FF99 atom types.
! Close the Assign AMBER Atom Types dialog and continue with the
protein preparation.
The Prepare Protein Structure dialog reappears, and the Type Atoms line still
reports that 53 atoms do not have the proper types. This is because the other sets
of AMBER and Kollman atom types have not yet been loaded on the ligand. If
you want to use other force fields than AMBER7 FF99, you will need to repeat
this operation for the appropriate sets of atom types. See Force Fields for
Biopolymer in the Force Field Manual for lists of atom types.
The Load Charges dialog appears (dialog description on page 104) with the
molecule already selected in the list.
! Make sure that the Water check box is on to assign AMBER7 FF99
charges to the water molecules.
! Press OK to assign the atom charges.
The Ligand Charges dialog appears, listing only methotrexate. Because the
cofactor matches a template in the dictionary, it was treated along with the
standard protein residues.
! Click A/MTX164 and click OK.
14. Orient the sidechain amides in all ASN and GLN residues to maximize
hydrogen bonding.
! On the Fix Sidechain Amides line press Fix.
The Prepare Protein Structure dialog reappears after checking for sidechain
bumps.
The analysis still reports that the sidechains are not involved in steric clashes.
However, in your own work you may find that some of the operations in this
dialog result in a few sidechain bumps. These can be taken care of easily via the
Fix button. The Set Sidechain Conformation dialog will then appear (dialog
description on page 185) with the appropriate residues already loaded. You can
then resolve the bad steric interactions quickly with the Scan Selected
Residue option.
16. Minimize the protein-ligand complex in progressive stages, using the AMBER7
FF99 force field.
The Minimize dialog is displayed (dialog description in the Force Field Manual).
The Energy dialog is displayed (dialog description in the Force Field Manual).
! Back in the Staged Minimization dialog, below the list of steps, change
Reset Steps To 10 and press Apply.
! Toggle off step 2. Minimize Waters.
17. Save the molecule. Use the Mol2 format as this will store the atomic charges.
2. To build the peptide sequence, bring up the Build Biopolymer dialog and use it
to define the sequence.
The macromol dictionary is opened automatically and the Build Protein dialog
is displayed (dialog description on page 142).
! Click TYR LYS CYS GLY LEU CYS GLU ARG SER PHE.
! Set the C-terminus to None. You will add more residues to the
sequence later.
By default, the dialog is set up to add all the hydrogens to the sequence being
built. For this first biopolymer tutorial you will switch off that option.
! Press Build.
3. Label the Cα with the residue types and sequence numbers. These are called
substructure labels.
4. NMR data indicate that there is a hairpin turn at residues 5-6. Modify the
structure to reflect this conformation.
! Press Set.
5. Add a few more residues to the peptide, connecting the new sequence to the
C-terminal.
! Biopolymer > Build > Build Protein
! Click VAL GLU LYS SER ALA LEU SER ARG HIS GLN.
! Press Build.
The nine residues from GLU12 through GLN20 are selected in the dialog and
highlighted on the screen.
! Press OK.
! Press Set.
7. Notice that the molecule has been built off center and rotates in an awkward
manner. Center the display of the molecule on the screen.
8. The molecule was given the generic name of builder_protein. Give it a more
specific name.
10. Restrict the scan to only those bonds involved in the alpha-helical regions.
! Press Sets.
Even though some of the atoms highlighted are enclosed in a ring (HIS19), the
SCAN operation recognizes them and simply eliminates the corresponding
bonds from the computation.
! Press OK and watch the molecule as the scan is performed.
When the scan is finished, the sidechains in the alpha-helical region are in a
reasonable conformation with no close contacts. The scan is an iterative
process. A message in the console reports that no bad contacts were found at the
end of the torsion scan:
Iteration 1 finished, fixed 28 bonds this iteration, 0 to go.
! Next to the list of States to Find press the icon (Select All).
The information found by this operation can be stored for reuse by creating
substructure sets for the sequences that match defined conformational states.
! Toggle Create Sets from Results on.
! Press Find.
The secondary structures along with their associated sequences are listed in the
console.
Finding conformations for molecule in m1 using Dictionary method...
Creating sets...
ALPHA_HELIX_A_DICT
BETA_SHEET_A_DICT
TURNI_A_DICT
-EEEE--EEEE----------- :beta_sheet
-----11--------------- :turnI
! Press Find.
The rendered image reflects the protein’s secondary structure: alpha helical
regions are rendered as magenta ribbons, beta strands as yellow curved arrows,
and the remainder of the protein as a cyan curved tube.
13. Another way to visualize the secondary structure of a peptide is to display only
the backbone and color it to highlight the regions of different secondary struc-
tures.
Atoms in the alpha helix are colored red and those in beta sheet conformation
are colored blue. If you had not created sets in the Find Secondary Structure
dialog, the backbone color would be entirely white.
14. Another color scheme of interest is that of acidic, basic and polar residues.
Instead of the menubar you can use the color icon.
15. The basic and polar residues of the helix are proposed to be involved in nucleic
acid binding. The Atom Expression dialog provides a powerful selection mecha-
nisms that can make use of built-in sets and conformational states to select
exactly those residues.
The first part of the selection retrieves all atoms in the Basic and Polar global
sets.
! Click .
You will now combine the initial selection with residues in alpha-helical
conformation.
In the field at the bottom of the Atom Expression dialog the expression reads
M1(({BASIC}+{POLAR})&({FINDCONF(alpha_helix,*)})).
! Press OK.
! Set to Magenta.
The alpha helical residues that are either basic or polar are colored magenta.
! Try the various coloring options of the icon or View > Color By
Scheme menu.
17. Before you go on, reset the color scheme to atom types and redisplay the
sidechains.
! View > Protein View > Whole Molecule (or > All Atoms).
! Click .
20. Before you continue, save this sequence as a .mol2 file, to be recalled later.
! Press Save.
! Press Types.
! Select SER from the residue types list and press Add.
When the modifications are complete, notice that the backbone conformation is
unchanged. In addition, the mutate operation preserves the sidechain conforma-
tions to the extent possible. Each residue is analyzed and the values of the
conformational angles which are defined in the dictionary are recorded for the
current residue. These angles are applied to the new residue as best as can be
done to preserve the conformational similarity between sequences.
22. The Excise Monomer option is used to remove a residue from the peptide
chain and join its neighbors to close the gap. As before, the conformational state
of the removed residue is preserved in the neighbor which replaces it. However,
there are times when it is not desirable to rejoin the residues to close the gap,
for example, when such action would result in the destruction of favorable inter-
actions elsewhere in the peptide. The Delete Monomers option is provided for
this situation. It performs the same action but does not reform the chain.
! Biopolymer > Composition > Excise Monomer
Note that GLU12 and CYS14 have been joined by a long bond. The geometry
of the rest of the peptide has been preserved.
Note the gap between ASN10 and GLU12. That is the difference between the
Excise and Delete operations.
23. Recall the .mol2 file that you saved earlier and add the remaining five residues
of the alpha-helical region.
! Select M1:znf as the molecule area to overwrite its content and press
OK.
! Type HIS19 to insert after this monomer and press OK. You could,
instead, select any atom in HIS19.
! Click GLN ARG VAL HIS LYS.
! Activate Adjust Geometry so that the GLN20 and the end of the
chain move to make way for the five residues you are inserting.
If you do not check Adjust Geometry, the existing structure of the peptide
will be maintained, resulting in bad local geometry.
! Press Insert.
25. Renumber the entire sequence to give consecutive numbers to the substructures.
! Type 1 as the number for the first monomer and press OK.
Note that the ACE blocking group at the beginning of the chain kept the
sequence number 0.
! Click .
28. All the hydrogens were included while you built the peptide. You can now
toggle some or all of them off.
Only the polar atoms remain visible. These are flagged in the dictionary’s
residue files as the essential hydrogens because they can be involved in
hydrogen bonds.
! Press HBonding.
Only hydrogens currently involved in hydrogen bonds are visible. All are in the
helical section of the peptide. These are some, but not all, of the polar
hydrogens.
31. To change the ribbon’s appearance, right-click it and specify the color and style
of your choice.
The exercise you will perform consists of modeling a portion of the three-
dimensional structure of the insect-directed scorpion neurotoxin of A. australis
based on its sequence homology to the variant 3 neurotoxin of C.sculpturatus.
The sequences of these two proteins are quite similar, and you will use the
sequence alignment published by J.C. Fontecilla-Camps (J. Mol. Evol., 63-67
(1989)) for this exercise.
The sequence of the target or unknown protein is given first; the sequence of the
template or known protein is given second.
1 5 10
Model: LYS LYS ASN GLY TYR ALA VAL ASP --- SER SER GLY LYS ALA PRO
sn3: LYS GLU GLY TYR LEU VAL LYS LYS SER ASP GLY CYS LYS TYR
1 5 10
15 20 25
Model: GLU CYS LEU LEU --- --- --- SER ASN TYR CYS ASN ASN GLN CYS
sn3: GLY CYS LEU LYS LEU GLY GLU ASN GLU GLY CYS ASP THR GLU CYS
15 20 25
30 35 40
Model: THR LYS VAL --- HIS TYR ALA ASP LYS GLY TYR CYS CYS LEU LEU
sn3: LYS ALA LYS ASN GLN GLY GLY SER TYR GLY TYR CYS TYR ALA PHE
30 35 40
45 50 55
Model: SER CYS TYR CYS PHE GLY LEU ASN ASP ASP LYS LYS VAL LEU GLU
sn3: ALA CYS TRP CYS GLU GLY LEU PRO GLU SER THR PRO THR TYR PRO
45 50 55
60 65 70
Model: ILE SER --- ASP THR ARG LYS SER TYR CYS ASP THR THR ILE ILE ASN
sn3: LEU PRO ASN LYS SER CYS
60 65
Using the sequence alignment given above, you will carry out one deletion and
a few of the single residue mutations. You will the use the Biopolymer Loop
Search functionality to look for peptide fragments with the desired number of
residues and the required end-to-end geometry to preserve a continuous
polypeptide chain.
3. If you have a personal default view defined reset the view of the protein so that
all atoms are shown and colored by atom type, and all bonds displayed as lines.
The molecule is colored by atom types. The eight CYS residues are involved in
disulfide bridges.
4. Delete all non-peptide atoms: all the waters and a molecule of 2-methyl-2,4-
pentanediol used in crystallizing the protein. These are stored in the molecular
description as individual substructures.
! Click .
! In the Atom Expression dialog click on the lines for Other Substruc-
tures and Waters.
All atoms in MDP66 and all water oxygens are highlighted. To these add all
hydrogens.
Note that a long bond fills the gap left by the deleted residue. You will replace
this gap with loop candidates later in the exercise.
2. The target protein is longer than the template by one residue on the N-terminus.
! Biopolymer > Build > Build Protein
! Press Build.
! Type 1 for the first monomer’s sequence number and press OK.
5. Perform one of the site mutations to convert the sequence of 2sn3 to the model
protein.
The Edit Protein Composition tool is displayed (dialog description on page 162).
6. Make the other mutations in this region of the protein. You can perform all
these within the same dialog.
! The Edit Protein Composition tool is still open. Use it to make the
following mutations (remember to press Apply to Selected
Sequence after each operation):
LEU6 :ALA
LYS8 :ASP
ASP10 :SER
More mutations would need be performed to complete the full model sequence.
However, the tutorial is limited to finding loop replacements in this region only.
! Close the dialog.
! Press Close.
8. Perform a loop search to find suitable replacements for a region of the protein
where one of the deletions occurred. By default the loop search uses two
N-terminal anchor residues and one C-terminal anchor residue (the defaults) for
this search. From the working alignment, the deletion being modeled in this step
is:
5 6 7 8 9 10 11
Model: ... TYR ALA VAL ASP --- SER SER GLY ...
2sn3 : ... TYR LEU VAL LYS LYS SER ASP GLY ...
4 5 6 7 8 9 10 11
! Biopolymer > Protein Loops > Search PRODAT Database
! Type 7 for the residue preceding the window region and press OK.
! Type 10 for the residue following the window region and press OK.
Ignore the messages in the console about “Tweak did not converge.” These are
indications that the process is still on-going. When the menubar is active you
are ready to proceed.
The results are reported in a spreadsheet named ASPSER. The loop fragment
that gave the best geometric fit was automatically melded into the model.
10. Use the spreadsheet’s Biopolymer menu to color the melded loop fragment in
the protein.
! MDE: Biopolymer > Color Loop > Green (or your favorite color)
! In the Examine Selected Rows dialog press Next to meld the next
loop into the protein.
! Examine the other loops.
12. Return to loop number 1 on the basis of good fit and high homology. Notice that
some of the other loops may be just as good and would have to be considered in
a serious modeling effort as possible alternatives.
! In the dialog press Previous to return to the first loop or Jump to
Row 1.
! Close the dialog.
The best candidate loops are those with (1) the smallest number of van der
Waals contacts or bumps, (2) the lowest value of RMS fit, and (3) the highest
value of homology score. Two of these pieces of information are automatically
built into the spreadsheet, and you will add the other.
14. Add a column containing the number of van der Waals contacts.
15. Graphs offer another visual presentation of the results. Display a scatter plot of
Fit RMS deviation of the anchor regions vs. the loop ID and color the points by
the number of vdW contacts.
- X Axis: Id.
- Y Axis: Fit_RMS.
- Color Axis: VDW_CONTACTS.
! Press OK.
16. Graph the mutation-rate-based homology score vs. the loop ID and color the
points by the number of vdW contacts.
! Pick a point in either graph and note that the corresponding point is
highlighted in the other graph and that the row corresponding to that
point is selected in the spreadsheet.
! Pick other points.
Usage Note: To meld the loop corresponding to a selected point or row use the
method to used earlier in this tutorial: MDE: Biopolymer > Examine
Selected Loops. If only one row is selected it is automatically melded into to
the protein. If multiple rows are selected the Examine Selected Rows dialog lets
you examine them one by one.
! Select Yes if you want to save the spreadsheet and scatter plots to a
table file. The default file name is taken from the spreadsheet name.
19. Add sidechains to residues in the loop (including the anchor regions) and scan
torsional values to eliminate bad interatomic contacts. The residues is the loop
are stored in a set called LOOP. You can make use of this to select them.
You are now ready to repair the geometry of the protein in the regions where
residues were excised. This involves loop searches for reasonable protein
fragments to paste into these regions. The fragments retrieved from the protein
database only contain the backbone atoms. After each search add sidechains and
fix their local geometry as above.
! Scroll down to the bottom of the Sets list and select LOOP then
press Add.
! Press OK to terminate the selection.
20. A preliminary model has been created, you may want to save it into a file before
exiting SYBYL.
22. Show only the backbone for both molecules for easier comparison.
! Clear all selection.
The colored loop in the model indicates where the changes were made.
The Current Dictionary Directory at the top of the dialog shows the location
of the dictionary directory that your SYBYL session is currently pointing to. If
the information is the same as what echo $TA_ROOT reported above followed
by /biopolymer/tables/dictionary, you are using the dictionaries distributed
with SYBYL.1
1. If you already have a personalized dictionary directory, you know a lot about dictionaries
already. Before you proceed with this tutorial be sure to open a dictionary that contains a copy
of the leu.res file as distributed by Tripos.
! Press OK in the Success dialog reporting that all source files were
your copied and that your private dictionary directory is now used.
6. Close the dialog so you can proceed and build the structure that will become
your new monomer.
7. Build the template for construction (based on leucine). Note that new monomers
must be built in the neutral, unblocked form.
! Biopolymer > Build > Build Protein
! Click Leu.
! Press Build.
You will now replace the H-labeled hydrogen by a carbon, not the other
hydrogen, which is the cap atom.
! Click on any atom to indicate that you want the sketcher to operate
on this molecule.
The three toolbars associated with the sketcher provide access to the sketching
tools, the essential atom types, and functional groups.
! Click the hydrogen on the terminal nitrogen whose location you noted
earlier.
The H label disappears, and the atom changes color. It is now a carbon.
9. Add the hydrogens to the methyl carbon and exit the Sketcher.
! Click EXIT.
This completes the creation of N-methyl leucine. Since leucine was used to
construct this new residue, it will be referred to below as the template.
When preparing a new monomer for addition to a dictionary you will need to
use a rigorous calculation method to compute the atomic charges. Consult the
information about Biopolymer Force Fields and Charges for Biopolymers (in
the Force Field Manual), Ref. 54 and Ref. 55. This, however, is not the focus of
this tutorial. For expediency, you will use the Del Re method because the values
it computes for amino acids are very close to those computed by AMBER.
The charge values are displayed on the molecule as atom labels. When you are
done viewing the charges, change the labels to atom names.
12. Select the molecule area containing the structure of the new monomer.
! Select M1:builder_protein as the molecule area and press OK.
14. To facilitate the monomer definition process, indicate which monomer already
in the dictionary most closely resembles the one you are adding. The properties
of the existing monomer will then be used as default for the new one.
The Create Monomer dialog appears prompting for information necessary for
creating the residue file (dialog description on page 241).
! In the Basename for Monomer File field, type mle. This will produce
a file called mle.res in the current directory.
! In the Complete Monomer Name field, type N-methyl_leucine
! In the 3-Letter Mnemonic field, enter MLE. This will be the acronym
for the new monomer.
! In the 1-Letter Code field, enter a period (.). This indicates that the
one-letter code is ignored for this residue.
16. Label the atoms by name. This will help you verify some of the assignments
that were made automatically based on the template.
Note that those atoms in MLE that are identical to those in the LEU template
have been named according to LEU. However, the additional atoms (those in
the N-methyl group) have not. SYBYL assigned unique names to these atoms:
C11, H1, H2 and H3. You can modify this if you want by clicking the Atom
Names button. Know, however, that all atoms must have different names.
17. Check the root atom, that is, the atom that will bear the substructure label. The
alpha carbon is the logical candidate.
18. Check the capping atoms, that is, the atoms that will be removed when the
residue is connected to adjacent residues.
The atoms currently corresponding to the backbone atoms in the template are
highlighted on the screen: N, CA, C, O, HA (hydrogen bonded to the Cα), and
C11 (the carbon of the N-methyl group).
! Hold the Ctrl key (Command on the Mac) and add to the atoms
already selected by clicking the three hydrogens in the N-methyl
group (H1, H2, H3).
! Press OK in the Atom Expression dialog.
All atoms other than those designated as backbone or capping atoms are
considered to be sidechain atoms.
20. Use the labels to check the known information about your new monomer.
The absence of highlights indicates that all atoms have been typed with
AMBER7 FF99 atom types, first using information from the dictionary
followed by an SLN typer that uses rules located in $TA_DICT/AMB_PARMS.
Examine the AMBER7 FF99 atom types closely, especially those on the newly
added group. One hydrogen was typed incorrectly as H1 (H attached to aliphatic
C with one electron-withdrawing substituent) and is better typed as HC (H
attached to aliphatic C with no electron-withdrawing substituents).
To change the AMBER7 FF99 atom type for the atom labeled H1:
The label for this atom confirms that its AMBER7 FF99 atom type was
changed.
Note that the same hydrogen was typed incorrectly for the other AMBER force
fields. For Kollman force fields, highlights provide visual clues that some atom
types and atomic charges could not be assigned.
22. You may modify atom names, force field atom types and atomic charges by
using a spreadsheet.
23. The various AMBER and Kollman atom types were taken directly from the
template in the dictionary followed by assignment using the SLN typer. If an
atom type cannot be assigned by either of these methods it is entered as “UNK”
in the spreadsheet.
! For the atom named C11, type C3 in the KU_T and KA_T columns.
! For atom named H1 (row 23), type HC in the A02_T, A95_T, and
KU_T columns.
For a list of valid atom types see Force Fields for Biopolymers (in the Force
Field Manual).
24. The charges were taken directly from the SYBYL screen. If charges are not
assigned to the molecule before entering the Create Monomer dialog, Pullman
charges are assigned by default.
Keywords define the monomer class. The choices were taken automatically
from the template and consist of:
• amino_acid—The monomer is a protein residue
• standard—The monomer is one of the 20 standard amino acids
• default—This information is no longer used.
! In the field delete ,standard,default.
The Monomer Information dialog informs you that the residue has been success-
fully created and stored in the file mle.res within your $HOME/dictionary
directory.
Although the monomer is now available for immediate use because it has been
automatically added to the dictionary that is currently open (in memory), it has
not yet been saved to the dictionary file. For that reason, the information dialog
also asks you whether to add the newly created monomer permanently to your
macromol dictionary. It is safe to do so, because it is very easy to modify a
monomer definition once it is in the dictionary.
! Read the message then press OK.
Your copy of the file macromol.dic was updated to include the MLE residue.
Warning: Some properties of a new monomer are not accessible via the Create
Monomer dialog. In your own work you will need to inspect and edit your new
.res file to update properties such as molecular weight and improper torsions for
the AMBER and Kollman force fields. See Complete Verification of the
Residue File on page 245 for a list of recommendations.
! In the Build Protein dialog, click on ALA, then MLE, and then ALA.
The peptide is displayed with N-methyl leucine in the middle of the chain.
# Open my dictionary
setvar TAILOR!BIOPOLYMER!DIRECTORY $HOME/dictionary
setvar TAILOR!BIOPOLYMER!DEFAULT_DICT macromol
The steps above need only be done once. From then on, when you start SYBYL
you will be using your private dictionary.
You can modify these instructions to create a shared dictionary (as opposed to a
private one) by using an appropriate new directory location (outside of the
SYBYL tree but in some shared disk space) and setting the protections as
needed on the files and directories.
When a new version of SYBYL is released, changes and additions may have
been made to the dictionaries. We strongly encourage you to compare what you
are using with what is in any new release. This comparison would include the
*.res and *.dic files and a comparison of the energy values obtained while
using the different dictionaries.
Composition
Protein Composition
Tool...
Replace Sequence... BIOPOLYMER REPLACE
Mutate Monomers... BIOPOLYMER CHANGE
Insert Monomers... BIOPOLYMER INSERT
Excise Monomers... BIOPOLYMER EXCISE
Delete Monomers BIOPOLYMER REMOVE
Conformation
Measure BIOPOLYMER MEASURE
Conformation...
Set Backbone BIOPOLYMER SET CONFORMATION
Conformation...
Find Secondary BIOPOLYMER FIND SEC_STR
Structure...
Predict Secondary BIOPOLYMER PREDICT_SECONDARY
Structure...
Set Sidechain BIOPOLYMER SET CONFORMATION
Conformation...
Scan Sidechain BIOPOLYMER FIX_SIDECHAINS
Torsions...
Copy Conformation... BIOPOLYMER COPY_CONFORMATION
Protein Loops BIOPOLYMER LOOP
Search PRODAT BIOPOLYMER LOOPS SETUP
Database...
Tweak Conformational BIOPOLYMER TWEAK
Search...
Analyze Search BIOPOLYMER LOOPS ANALYZE
Results...
Compare Sequences
Align and Write MSA... BIOPOLYMER ALIGN_SEQUENCES
BIOPOLYMER MULT_ALIGN_SEQ
View/Edit
Alignments...
List Sequence... BIOPOLYMER SEQUENCE
FUGUE (license requirement)
Compare Structures
Fit Monomers... BIOPOLYMER FIT
Align Structures By BIOPOLYMER ALIGN_STRUCTURES
Homology...
Local RMS Fits of
Conformers...
Find and Fit Fixed BIOPOLYMER RESIDUE_FIT
Regions...
Model Proteins
FUGUE (license requirement)
ORCHESTRAR (license requirement)
Create Quick Model... (license requirement)
Analyze Protein
Create ProTable...
SiteID Find Pockets...
SiteID Create Table...
Search Database...
Dictionary & Database
Admin
Open Dictionary... BIOPOLYMER DICTIONARY OPEN
Close Dictionary... BIOPOLYMER DICTIONARY CLOSE
Manage Custom BIOPOLYMER DICTIONARY CREATE MONOMER
Dictionary... BIOPOLYMER DICTIONARY CREATE DICTIONARY
BIOPOLYMER DICTIONARY ADD MONOMER
List Dictionary... BIOPOLYMER DICTIONARY LIST
Create/Update mkprodat utility
PRODAT Database
Sequence Viewer...
Define View ( )
Reset View
Whole Molecule BIOPOLYMER DISPLAY
C Alpha Trace BIOPOLYMER DISPLAY
Sidechain Trace BIOPOLYMER DISPLAY
Backbone Only BIOPOLYMER DISPLAY
View > Surfaces and Ribbons MOLCAD RIBBON
> Quick Ribbons ( ) RENDER PROTEIN
View > Color by Scheme
Secondary Structure BIOPOLYMER COLOR BY_SECONDARY_STRUCT
Chain BIOPOLYMER COLOR BY_CHAIN
Property BIOPOLYMER COLOR BY_PROPERTY
Acid/Base BIOPOLYMER COLOR BY_ACID_BASE
Hydrophobicity BIOPOLYMER COLOR BY_HYDROPHOBICITY
B-Factors BIOPOLYMER BFACTORS
Items on the Selection menu do not map to any commands because there is no
command equivalent to a selection without any applied action.
Additional Information:
Retrieve PDB Coordinates from Other Sources on page 64
Protein View
If you have a default protein view defined it will be automatically applied to
any structure containing a protein component when it is read in from a file in
PDB format. See Define and Apply Protein View Settings on page 72.
Molecule Name
By default, the molecule is given the name of the PDB file. This behavior is
controlled by Tailor variable PDB MOLNAMERULE.
Substructure Names
A substructure’s name is derived from the chain name, the residue name, and
the residue number in the PDB file. For example: A/GLU4.
ALA 1 ALA1
ALA -1 ALA_1
ZN 2 ZN2
ZN 21 ZN21
ZN2 1 ZN2_1
ZN2 -1 ZN2__1
Z2A 1 Z2A1
Z2A -1 Z2A_1
By default, residue information found in the PDB file is stored. This infor-
mation is used to convert between PDB, SYBYL, and FlexX substructure
naming conventions. This behavior is controlled by Tailor variable PDB
RETAIN_PDB_SUBSTINFO.
HETATM Residues
HETATM records for a modified amino acid with identifiable backbone atoms
are reported (in the console) as “modified residue” if the residue name in the
PDB file does not match any of the residues in the dictionary. These modified
residues are stored in the {HETATM} sets and appear as “X” in the single-letter
sequence. Modified residues are treated as regular residues in the biopolymer
chain and behave as all known residues in biopolymer operations such as ribbon
display and residue mutations.
Cofactors
Cofactors are treated the same way as regular monomers if the residue name in
the PDB file’s HETATM record matches the name of a cofactor residue file in
the open dictionary. For a list, see Cofactors in the macromol Dictionary on
page 154. Cofactors recognized by SYBYL are stored in the {HETATM} set,
not in the {UNK_ATOMS} set.
Ligands
The macromol dictionary includes a ligand database (ligand_db.def) that is
based on information retrieved from the Ligand Depot site, a service associated
with RCSB. The ligand database and an additional database of chemical groups
greatly improve the SYBYL PDB reader's ability to assign correct atom and
bond types to most ligands. Atoms typed by the ligand database are stored in the
{LIGDB} set, which is included in the {HETATM} set. These atoms are not
stored in the {UNK_ATOMS} set.
In the set names above, x is taken from the information in columns 12-14 of the
PDB file’s HELIX, SHEET, TURN, and SITE records.
and stored with the molecular description in a local substructure set named
HELIX_H1_PDB.
Similarly, the following lines in 4ins.pdb produce the single set name
SHEET_B_PDB:
SHEET 1 B 2 PHE B 24 TYR B 26 0
SHEET 2 B 2 PHE D 24 TYR D 26 -1 N PHE B 24 O TYR D 26
Water Molecules
Upon reading a PDB file, all atoms named HOH, H2O, WAT, and WTR are
treated as waters, given substructure names that begin with the string HOH, and
stored in the local set {WATER}.
Atom Names
All spaces in atom names encountered while reading in a PDB file are automati-
cally converted to “_” .
Bonds
In a few PDB files, the interatomic distances in the backbone differ substan-
tially from standard values, causing SYBYL’s PDB reading functionality to
miss some connectivities and break the molecules in multiple chains. Use Tailor
variable PDB INTER_TOLERANCE to allow some deviation from standard
backbone bond lengths when assigning the connectivity.
After processing the content of the PDB file through the dictionary rules and
ligand database, the PDB reader adds bonds between atoms whose interatomic
distance is within a range of values (by default 1.0 to 1.8 Å). This operation
applies to all atoms in the PDB file and may result in extraneous bonds in
regions of poor geometry. The behavior is controlled by Tailor variable PDB
ADD_BONDS_BASED_ON_DISTANCE. To suppress this operation add the
following line in your $HOME/sybyl.ini file (sample sybyl.ini file in the
Toolkit Utilities Manual):
setvar TAILOR!PDB!ADD_BONDS_BASED_ON_DISTANCE NO
Atomic Charges
PDB files only store formal charges on individual atoms. ATOM records in
PDB files include a field for temperature factors which SYBYL reuses to store
atomic charges when writing out PDB files. However, these values are inter-
preted as temperature factors when a file is read into SYBYL. To convert these
numbers into atomic charges, use the command CHARGE mol_area VALIDATE
YES. This must be done every time the file is read. A permanent alternative is to
save the molecule in the Mol2 file format.
Large Molecules
The standard PDB format supports up to 99,999 atoms: atom serial numbers
(which must be unique) are stored in columns 7-11. SYBYL’s PDB reader can
interpret PDB files with 100,000 or more atoms under the following conditions:
• The atom serial numbers have been shifted left (columns 6-11) or right
(columns 7-12).
• The atom name and subsequent fields in the ATOM and HETATM
records are in their standard positions (columns 13-80).
• CONECT records involving atoms above 99,999 have been deleted from
the input file.
To conform with the PDB format, SYBYL includes the element type in columns
77-78 of the ATOM and HETATM records.
Use Tailor variable PDB to alter the parameters used when writing PDB files.
One of the variables allows you to write files in various PDB flavors, such as
PDB v.2.3, PDB v.3.1, or AMBER. Another variable determines whether to
include the chain names (by default, SYBYL does not write a chain designator
when a molecule has only one chain).
Warning: If the molecule was built with small molecule tools and includes a
PHENYL group, saving it to a PDB file shortens the phenyl’s substructure name
to PHE. When the file is read back in, the dictionary misinterprets this phenyl
group as a phenylalanine because of the substructure name. To avoid this
problem, edit the PDB file and change the substructure name.
Large Molecules
The writing of PDB files with more than 99,999 atoms is not standard and is not
supported in SYBYL. RCSB’s recommendation is to split such molecules into
multiple PDB files. Alternatively, the .mol2 format can be used to store a large
molecule as a single unit.
RCSB Server Access the RCSB Server Details dialog where you can
Details specify:
• RCSB Server Address—ftp.wwpdb.org
(PDB format 3.1; default) or ftp.rcsb.org (PDB
format 2.3).
• Path to Files at RCSB—Both formats use the
same path: pub/pdb/data/structures/
divided/pdb/
• FTP Idle Time—Idle connection time in seconds
(default = 30)
See Tailor variable PDB FTP for details.
Load Retrieved Whether to load the retrieved PDB file(s) into SYBYL
PDB File (irrelevant when retrieving from PRODAT).
Protein View
If you have a default protein view defined it will be automatically applied to the
retrieved structure if it contains a protein component. See Define and Apply
Protein View Settings on page 72.
The requested file is retrieved and placed in the current working directory. To
load the structure into SYBYL use the PDB IN command.
Additional Information:
• Activities Upon Reading a PDB File on page 59
• Tailor variable PDB FTP (in the Tailor Manual)
! Set the rest of the dialog as follows (these are the defaults):
- Retrieve From: RCSB Server
- Load Retrieved PDB File: on
! Click Retrieve.
4. If you encounter persistent failure, explore the RCSB Server Details option in
the dialog or contact your system administrator.
If you have any question about the PDB format, check the official PDB manuals
at:
http://www.wwpdb.org/docs.html
Literature reference:
H.M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T.N. Bhat, H. Weissig,
I.N. Shindyalov, P.E. Bourne, The Protein Data Bank. Nucleic Acids
Research, 28, pp. 235-242 (2000).
The sequence found in the file is used to construct a protein. The secondary
structure is assigned using the MAXFIELD_SCHERAGA (Bayes Statistics) method
(see Secondary Structure Prediction on page 319.)
Blocking groups are not included in PIR or FASTA files written by SYBYL.
If multiple chains are present, a separate file is created for each chain, the chain
names are appended to filename and the appropriate extension is added (.pir or
.fasta). For compatibility with the NBRF database, the filename may be the
unique retrieval key assigned to each entry in the PIR database.
The BIOPOLYMER PIR OUT command assumes that all structures or sequences
generate protein fragment files.
The file produced consists of three or more lines: a HEADER line, a TITLE
line, and one or more SEQUENCE lines. An example follows:
>F1;”file_base”
SEQUENCE NAME - ORIGIN NAME
AACDHEKLLLISSTRTLINEQWLLTTAKNLFL
VMPICLPSKDY*
The SEQUENCE NAME of the protein is assumed to be the name of the molecule
used as input, if any. Otherwise, this field and the ORIGIN NAME, are either
NONE or UNKNOWN. For completeness you may want to edit the files later to
modify these entries.
The sequence itself is a string composed of the one letter symbols for amino
acids. The sequence may span several lines, and is terminated with a single
asterisk.
License Requirement:
See License Requirements for Biopolymer on page 8.
You may define your own settings and save then as named views, then select
one of those as your personal default view.
If you like these settings and would like to apply them to all PDB files when
you read them, save them in a named view and make it your default view.
In the absence of a user-defined default view you can apply these settings to any
protein complex: View > Protein View > Apply Default View.
Usage Notes
When accessing the Protein View dialog:
• If a named view had been applied to the molecule, the settings in the
dialog are those of that view.
• Under all other circumstances the dialog’s default settings are applied to
the molecule. In addition, any surfaces and/or ribbons that were present
on the molecule before invoking the Protein View dialog are temporarily
hidden then restored when you close the dialog.
View Sets
Auto Load Whether to apply the view selected in the list upon
Selected View selecting it. This feature is on by default and is used
only within the dialog. Toggling it off makes it easier to
select a view in the list when the purpose of the selec-
tion is to delete the view.
Defined Views List of all the view settings saved in your $HOME/
.sybyl/ProteinView directory. Select one to apply it to
the current protein complex. The selected view will
remain highlighted in the list until you change any of
the settings in the dialog.
Delete the selected view from the saved list. Its associ-
ated file will be deleted.
Apply This button is enabled only when Auto Load
Selected View above the list is off.
Reset Applies the dialog’s default settings.
Protein Color By Toggle this check box to color the Protein component
Distance by distance ranges to the ligand as specified by the
region sliders and color options below. This option is
disabled if the structure does not include a defined
ligand.
Region Sliders All sliders range between 1 and 20 Å. They are active
only when the Protein is colored by distances.
• Near—All residues that have at least one atom
within the specified distance of any “Ligand” atom.
• Mid—All residues beyond those in the Near region
that have at least one atom within the specified
distance of any “Ligand” atom.
• Far—All residues beyond those in the Mid region
that have at least one atom within the specified
distance of any “Ligand” atom.
• Hide—Undisplay all residues that have at least one
atom beyond the specified distance of any “Ligand”
atom.
Color Color the “Protein” atoms within the region by any of
the 24 SYBYL colors or by atom type.
Surfaces
Ribbon
A cartoon-style MOLCAD ribbon is created for the “Protein” component. The
shape of the ribbon represents the secondary structure elements: helices for
helical regions; curved, directional arrows for beta strands; and tubes for the
remaining residues.
Label Substruc- Whether to label the residues in the Near and Mid
tures regions.
Bond Line Width Thickness of the lines representing bonds in the “Pro-
tein.” The value ranges from 1 to 5, with a default of 1.
In the absence of a named view identified as the default view, the default
settings in the Protein View dialog are applied.
View settings are defined in the Protein View Dialog on page 74.
Warning: Chain trace display is intended for viewing purposes only. Many
SYBYL commands will not work correctly in this display mode.
The visual effect is similar. However, the molecular display is unaffected, and
the trace is represented as a background image stored in a display file
(trace.dsp by default). Read about the handling of background images in the
Graphics Manual.
Known Limitation:
If the molecule has rotatable bonds defined, you may see artifacts in the C-alpha
display. If you see multiple C-alpha traces, type the following commands in the
console:
TWIST FREEZE mol_area
TWIST STATUS INACTIVE mol_area
You should now see the single C-alpha trace. After returning to the whole
molecule display, reactivate the rotatable bonds:
TWIST STATUS ACTIVE mol_area
Additional Information:
• Biopolymer Ribbons on page 84
• Line Ribbon Display of Biopolymers on page 88
Warning: Chain trace display is intended for viewing purposes only. Many
SYBYL commands will not work correctly in this display mode.
For all styles of ribbons DNA base pairs are symbolized by polygons.
Additional Information:
Working with Ribbons in the MOLCAD Manual
The shape of the rendered image reflects the protein’s secondary structure:
alpha helical regions are rendered as ribbons or cylinders, beta strands as curved
arrows, and the remainder of the biopolymer as a curved tube.
Usage Note: If no secondary structure sets are present, the menubar approach
will create them automatically by running the command BIOPOLYMER FIND
SEC_STR and using the Kabsch-Sander method (see Find Secondary Structure
via the Command Line on page 179).
Labels may be applied to individual atoms or to all relevant atoms in the whole
molecule.
The detailed atom name is displayed as a label on the atom, and further infor-
mation relating to that atom is listed in the console.
For example:
• the label A/ALA2.N means that the atom is a nitrogen (from the amide
backbone) from the second residue, an alanine, in the peptide chain
labelled A.
• The message appearing in the console provide additional information:
the molecule is in molecule area M1, has the SYBYL atom type N.am,
and the SYBYL atom ID 9:
Mol: M1 A/ALA2.N : N.am Atom id: 9
Note: You may find it more convenient to display Ramachandran graphs from a
ProTable spreadsheet.
Additional Information:
Tailor subject RAMACHANDRAN
Caveat: If the structure does not contain at least one standard amino acid, the
analysis cannot proceed.
Analysis
If water molecules are included in the selection, their hydrogens are placed in a
random orientation.
Lone pairs on sulfur atoms are added or removed as required by the chosen
AMBER/Kollman force field.
Select Residue
Orientation
Procedure
• Any request to compute charges starts by deleting existing charges on
the selected atoms and setting new values to 0.0.
• Lone pairs on sulfur atoms are added or removed as required by the
chosen AMBER/Kollman force field.
• Charges for the biopolymer component of the molecule can be taken
from the dictionary or computed by the SLN typer or a combination of
both.
• All atoms for which charges can not be assigned are stored in the atom
set {ZERO_CHARGE}. This examination is always done for the entire
molecule.
Loads the selected charge set according to the procedure described above by:
• First performing a dictionary look up and reporting the atoms for which
charges could not be found.
• Then invoking the SLN typer to assign charges to the atoms (in the
selection) which still have zero charges. If terminal residues and
blocking groups are among the selected atoms, the SLN method is used
to assign charges to these atoms as well.
Loads the selected charge set according to the procedure described above, but
uses only the definitions in $TA_DICT/AMB_PARMS. The atom type set
KOLL_UNI and the matching charge sets are not available for this option.
The drawback of this method is that the SLN typer can not assign charges to a
substructure that is incomplete (even missing a hydrogen) or bonded to a ligand.
Because the charges of all selected atoms are set to zero before the charge
lookup, if SLN typing fail for a substructure the charges are set to zero for all its
atoms.
Loads the selected charge set according to the procedure described above, but
uses only information in the open dictionary’s residue files. The drawback of
this method is that charges may not be assigned properly to atoms belonging to
terminal residues.
Looks over the complete molecule and (re)defines the set of atoms with zero
charges {ZERO_CHARGE}. If all atoms have a charge other than 0.0 the set is
deleted. This command is automatically called by all the charge commands
described above and by the Load Charges dialog.
Additional Information:
• Charge Derivation for Biopolymers in the Force Field Manual
• How to Change a Single Atom’s Charge in the SYBYL Basics Manual
Remarks:
• A block can only be added to the terminal residues in a biopolymer
chain.
• Adding a blocking group causes the deletion of some atoms in the
residue to which it attaches.
• Deleting a blocking group from a molecule reconstructs any missing
pieces of the adjoining residue.
• Blocking groups have a residue number. The blocking group at the
beginning of a chain is numbered 0 (or 1 less than the number of the first
residue in the chain).
• The modified terminal residues and capping/blocking groups are stored
in the substructure set {FIXED_TERMINI}.
Additional Information:
• Define a New Blocking Group on page 252
• Fix End Groups on page 112 to modify terminal residues with correct
atom geometry
• See Partial Charges for Blocking Residues in the Biopolymer Dictionary
in the Force Field Manual for a discussion of derivation of charges for
blocking groups
For Proteins:
• The N terminus is capped by AMN (Charged; -NH4+), AMI (Neutral;
-NH3) or one of the following blocking groups (see the Force Field
Manual for partial charges on N-terminal groups).
• ACE: N-acetyl
• PYR: N-pyroglutamyl
• FOR: N-formyl
• NMT: N-methyl
• BOC: N-t-butyloxycarbonyl
• The C terminus is capped by CXL (Charged; -COO-), CXC (Neutral;
-COOH) or one of the following blocking groups (see the Force Field
Manual for partial charges on C-terminal groups).
• NME: N-methyl amide
• AMD: amide
• NMM: N,N-dimethyl amide
• CME: methyl
Blocking groups and caps at the beginning of a chain are given the sequence
number 0.
Additional Information:
• BIOPOLYMER POLY_BLOCK, a more powerful command to edit chain
termini
• Define a New Blocking Group on page 252
• Fix End Groups on page 112 to modify terminal residues with correct
atom geometry
• See Partial Charges for Blocking Residues in the Biopolymer Dictionary
in the Force Field Manual for a discussion of derivation of charges for
blocking groups
Caps will be added only to legitimate connection atoms that have unfilled
valences. The cap fragments are defined in the open dictionary. Biopolymers
built using one of the dictionaries already have caps. Typically, cap need to be
added when the biopolymer is read in from an outside source.
Additional Information:
BIOPOLYMER POLY_BLOCK, a more powerful command to edit chain termini
There are two terminal residues per uniquely named chain in a molecule. These
terminal residues are stored in the substructure sets {CHAIN_HEAD} and
{CHAIN_TAIL}. Fixing end groups updates the chain termini sets.
Additional Information:
• Edit Terminal Residues via the Command Line on page 110
• Chain Termini Sets on page 131
• Biopolymer End Group Modeling on page 317 for information about end
group modeling
• Biopolymer Charges in the Force Field Manual for a discussion of
derivation of charges for blocking groups
• The Kollman Force Field in the Force Field Manual for information
about the implementation of the Kollman force field in SYBYL
Biopolymer > Prepare Structure > Fix SYBYL Atom Types in Cofactor
Molecule Select from this list the molecule to which you want to
apply the atom expression.
Cofactor Select the type of cofactor present in the protein:
cAMP, cGMP, ADP, GDP, ATP, GTP, Porphin, Proto-
porphyrinIX, HEME-FeII, NAP, NADP, FAD. If you
select the entire list (click the icon), SYBYL will
find the largest matching cofactor.
Override Dictio- Use this option to ignore any cofactor already identified
nary Types based on definitions in the dictionary and to assign
SYBYL atom and bond types based on the specified
template(s) instead.
The HEME-FeII template contains a bound iron atom and will set the bonds
types to the iron to NC (non-chemical). NC bonds are invisible on the display
and are ignored during minimizations. Note that, to match this template, the iron
atom must have four bonds to the porphyrin ring in the PDB file.
A side effect of modifying the atom types is that SYBYL may prompt you for
the appropriate type for one or more bonds. If this happens, press OK after
entering the bond type. Once all the atom types are correct, the bond types will
be adjusted automatically.
Assigning atom types to a PDB file is, in most cases, handled completely by
SYBYL’s PDB Reader. If you need to assign atom types manually we
recommend that you proceed in the following sequence:
1. Assign SYBYL atom types to the cofactor (if any)
2. Assign SYBYL atom types to the ligand.
3. Assign AMBER atom types to all selected atoms. This operation relies
on information in the dictionary and SLN-based file about standard
residues and general functional groups. For that reason it is better to
assign correct SYBYL types to ligands and cofactors before proceeding
with the AMBER type assignment.
Upon reading a PDB file into SYBYL check the console for the presence of the
following line:
NOTE: Check atom and bond types for atoms in local set UNK_ATOMS.
Biopolymer > Prepare Structure > Fix SYBYL Atom Types in Ligand
• Specify the molecule area containing the molecule of interest.
• SYBYL attempts to set the ligand atom types automatically by using the
SLN definitions for ligand templates (in $TA_DICT/ligand_db.def).
and chemical groups (in $TA_DICT/group_db.def). Press OK to start
this operation.
• If the ligand contains unconnected atoms SYBYL attempts to add the
bonds with the command QUICKBOND and to determine the bond types
with the command MODIFY BOND AUTO_TYPE. Press OK to start this
operation.
• The ligand is displayed in the center of the graphics screen with its
updated SYBYL atom types.
• A dialog offers an opportunity to make manual adjustments if necessary.
• Press Cancel if the automatic typer produces the correct results.
• Press OK to proceed with manual corrections. You can then select
specific atoms and assign them the proper SYBYL atom types. Note
that if you need to change an atom to a different chemical element
you will also need to change the atom’s name to reflect that.
Assigning atom types to a PDB file is, in most cases, handled completely by
SYBYL’s PDB Reader. If you need to assign atom types manually we
recommend that you proceed in the following sequence:
1. Assign SYBYL atom types to the cofactor (if any)
2. Assign SYBYL atom types to the ligand.
3. Assign AMBER atom types to all selected atoms. This operation relies
on information in the dictionary and SLN-based file about standard
Assigning atom types to a PDB file is, in most cases, handled completely by
SYBYL’s PDB Reader. If you need to assign atom types manually we
recommend that you proceed in the following sequence:
1. Assign SYBYL atom types to the cofactor (if any)
2. Assign SYBYL atom types to the ligand.
3. Assign AMBER atom types to all selected atoms. This operation relies
on information in the dictionary and SLN-based files about standard
residues and general functional groups. For that reason it is better to
assign correct SYBYL types to ligands and cofactors before proceeding
with the AMBER type assignment.
Selection
Number of Miss- Reports the number of atoms for which the specified
ing Atom Types atom types could not be found in the open dictionary
(typically, belonging to ligands, cofactors, and metals).
Note that the number reported does not include atoms
that may have been typed incorrectly by the dictionary
(typically for terminal residues). The missing atoms are
stored in atom sets named according to the selected
type set: {UNK_AMBER7_FF99},
{UNK_AMBER7_FF02}, {UNK_AMBER95_ALL},
{UNK_KOLL_ALL} {UNK_MMFF94}.
The corresponding command is BIOPOLYMER LOAD
DEFINE_UNKSET.
Assign Atom Load the specified set of atom types to all the selected
Types Atoms and marked as user-defined. If this operation
fails for a few atoms (such as lone pairs or unrecog-
nized functional groups) the Number of Missing
Atom Types indicates unassigned atoms. Use the
experts options below to assign them. Lone pairs on
sulfur atoms are added or removed as required by the
chosen AMBER/Kollman force field.
The corresponding commands are BIOPOLYMER LOAD
OTHER_ATOM_TYPES followed by BIOPOLYMER LOAD
DICT_TO_USER.
Expert Options
Loads the specified atom type set according to the procedure described above
by:
• First performing a dictionary look up and attempting to assign types to
atoms not marked as user-defined for the same type set.
• Then invoking the SLN typer to assign atom types to the remaining
atoms (typically terminal residues, blocking groups, and unknown/
unparametrized atoms in ligands and cofactors). The treatment of
Assigns the specified atom types to the selected atoms according to the
procedure described above, but uses only the definitions in $TA_DICT/
AMB_PARMS. Assigned atom types are marked as user-defined.
Tailor variable
The Tailor variable BIOPOLYMER SLN_TYPER_MODE controls whether existing
atom types for the specified AMBER/Kollman type set are overwritten by the
SLN typer:
• KEEP_USERDEF (default)—Prevents changes to atom types already
marked as user-defined. See Marking Atom Types as User-Defined
below for the command that marks atom types as “user-defined” for
safekeeping.
• ASSIGN_UNKNOWN—Processes only atoms that do not have atom type
definitions in the dictionary .res files and no definitions marked as user-
defined.
• ASSIGN_ALL—Retypes all atoms using the SLN atom typer, overwriting
existing atom types.
Assigns the specified atom types to the selected atoms according to the
procedure described above, but uses only information in the open dictionary’s
residue files. This operation does not overwrite atom types already marked as
user-defined.
Marks all atom types assigned from the dictionary as user-defined types. This
has the following advantages:
• User-defined atom types are not overwritten by the SLN typer when
using the default set by Tailor variable BIOPOLYMER SLN_TYPER_MODE
KEEP_USERDEF (see Tailor variable above).
• User-defined atom types are stored in the molecular description and are,
therefore, written out when the molecule is saved to a .mol2 file.
• User-defined atom types are protected from overwriting by the automatic
and mandatory checking of terminal residues.
In many instances the atom types stored in the dictionary are identical to those
produced by the SLN typer. This is the case, for example, for standard amino
acids that are not terminal residues. This command compares, for the selected
atoms and the specified type set, the atom types marked as user-defined and
those in the open dictionary. Those found to be identical, are no longer marked
as user -defined types and the corresponding atoms are removed from the appro-
priate atom set.
This option is useful to identify only those atoms that were typed by any
method other than the dictionary. Such methods include the SLN typer, manual
assignment, and import from an external source via a .mol2 file. After this
operation for the AMBER7-FF99 force field, for example, you can use the
command MODIFY ATOM OTHER_TYPES AMBER7_FF99 LIST USER to list
the minimal set of user-defined atom types for the selected atoms. Additional
user-defined atom types can be removed manually via MODIFY ATOM
OTHER_TYPES AMBER7_FF99 UNASSIGN.
Looks over the complete molecule and (re)defines the set of atoms with
unknown atom types for the specified type set. The set names match the
available force fields: {UNK_AMBER7_ALL}, {UNK_AMBER95_ALL},
{UNK_KOLL_ALL}, {UNK_KOLL_UNI}. If all atoms have types for the
specified type set the corresponding set is deleted.
Residue Selection
Sidechain Positioning
Initial Sidechain Select the source of the initial conformation for each
Position sidechain being added:
• SYBYL—The conformation of the matching
residue in the open dictionary.
• Lovell—The rotamer in the Lovell rotamer library
that results in the fewest bumps with the rest of the
molecule. (S.C. Lovell, J.M. Word, J.S. Richardson
and D.C. Richardson in “The Penultimate Rotamer
Library.” Proteins: Structure Function and
Genetics, 40, 389-408 (2000).
http://kinemage.biochem.duke.edu/databases/
rotamer.php)
Tailor variable BIOPOLYMER ROTAMER_DIRECTORY
adds user-defined rotamer libraries to this list. In partic-
ular, see Dunbrack Rotamer Library on page 189.
Scan Sidechains Whether to attempt to remove steric interactions
between the added sidechains and the rest of the mole-
cule. Torsion angles in the new sidechains are scanned,
through a full 360°, for positions that relieve bad steric
interactions. Only one bond at a time is altered. After a
position is found, that bond is removed from consider-
ation. Scanning continues until all interactions depen-
dent upon these bonds are relieved or until no progress
is made from one iteration to the next.
Number of Incre- Number of angle steps used to rotate through 360°. The
ments (N) amount of rotation at each step is 360/N.
The default value is taken from Tailor variable SCAN
NUMBER_INCREMENTS
Scan vdW Fac- Constant scaling factor to apply to all van der Waals
tor radii.
Tailor variable SCAN VDW_SCALE
Action Buttons
Add Sidechains Add the sidechains on the selected residues using the
desired method. The dialog remains open so you can
add more sidechains using a different method if
desired.
Add and Close Add the sidechain on the selected residues and close the
dialog.
Additional Information:
• Set Sidechain Conformation on page 185
• File Format for Rotamer Libraries on page 188
• Fix Prolines on page 130
The sidechains to be added are determined from the residue types. The confor-
mations are retrieved from the matching residues in the open dictionary. If a
selected residue already has a sidechain nothing it done to the existing
sidechain.
Additional Information:
• Fix Prolines on page 130
• Scan Torsions to quickly find reasonable conformations for the newly
added sidechains (in the SYBYL Basics Manual)
Proline is the only standard amino acid containing a ring in its backbone, and its
preferred phi angle value is not near -70°. Mutating a residue into a proline
results in a very poor geometry of the proline residue.
Fixing prolines does not reset the phi angle directly, since this would affect the
geometry of the rest of the chain. Careful energy minimization must be applied
after a residue has been mutated into a proline. See the Force Field Manual for a
description of local minimizations.
Additional Information:
Set Chain Names on page 132
Additional Information:
Chain Termini Sets on page 131
Blocking groups and caps at the beginning of a chain are given a sequence
number that precedes the number of the first residue. This means that if the first
residue is given the number 1 in the sequence, the blocking group or cap
connected to it is numbered 0.
Additional Information:
How to modify a single substructure name (in the SYBYL Basics Manual)
Bond angles and bond lengths are compared to their equilibrium value in the
KOLL_ALL (AMBER all-atom) force field. Deviations greater than those
specified by Tailor variable BIOPOLYMER CHECK_GEOMETRY are reported.
For peptide bonds, omega angles that deviate more than the specified threshold
from 180° are reported.
UIMS Variables:
• BIO_CHECK_NBAD_ANGLES = number of non standard angles found
• BIO_CHECK_NBAD_BONDS = number of non standard bonds found
• BIO_CHECK_NBAD_CHIRALS = number of non standard chiralities found
• BIO_CHECK_NBAD_OMEGAS = number of non standard omega angles
found
Additional Information:
• Measure Conformation on page 174 to measure omega and zeta angles
• ProTable Check Local Geometry (in the ProTable Manual)
For most cases you should choose an empty molecule area for the resulting
biopolymer. If, however, you choose a non-empty molecule, SYBYL assumes
that it already contains a molecule with the proper sequence, and performs the
conversion without prompting you for the residue sequence. In either case, the
small molecule must be topologically identical to or a super-structure of the
residue sequence or existing biopolymer.
Additional Information:
• MODIFY ATOM OTHER_TYPES to display Kollman atom types (SYBYL
Basics Manual)
• See the Force Field Manual for the list of Kollman atom types
Biopolymer > Build > Build Protein (also DNA Strand, DNA Double
Helix, RNA Strand, RNA Double Helix or Carbohydrate.
Additional Information:
• Protein Modeling on page 309
• Nucleic Acid Modeling on page 328
• Polysaccharide Modeling on page 330
Residues
Addons
Remarks:
By default, the biopolymer building operation adds charged end groups to the
chain termini. These groups are taken for the open dictionary.
Additional Information:
• List Dictionary on page 237 to list the residues and conformational states
available in the current dictionary
• Biopolymer Charge in the Force Field Manual for a discussion of charge
derivation
For the selection of the attachment point, each of the atoms in the expression is
examined, and, if more than one is a legitimate attachment point, you are asked
to select one. If the chosen atom(s) is not a valid attachment point, SYBYL will
look for any atom in the same residue that is valid.
If the specified sequence includes residues with more than two connection
atoms (for biopolymers they can be non-linear, such as polysaccharides), Tailor
variable BIOPOLYMER ASSIGN_ATTACH_MODE controls how often you will be
prompted to resolve the ambiguities in the connection.
Tailor Variables:
• Tailor variable BIOPOLYMER BUILD_HYDROGENS determines if and
what hydrogens are added to the new residues.
• Tailor variable BIOPOLYMER ASSIGN_ATTACH_MODE
Additional Information:
• Biopolymer Dictionary on page 236
• A description of the syntax for residue sequences and conformation
specification in the SPL Manual
• Biopolymer Charge in the Force Field Manual for a discussion of charge
derivation
The positions of the sulfur lone pairs are adjusted after the addition of disulfide
bonds.
Method:
This command implements a generalized version of the procedure described in
the paper “Modelling the polypeptide backbone with ‘spare parts’ from known
protein structures” by Claessens et. al. [Ref. 35].
length and a specified maximum length (by default, 10 residues). The process is
then repeated after moving N residues down the chain and looking for the next
fragment from the database. However, to avoid discontinuities at the junction of
two fragments successive fragments are allowed to overlap. The number of
residues that overlap determines how many residues to trim off of the end of
one fragment (by default, 2 residues) and at the beginning of the next fragment
(by default, 1 residue).
Tailor Variables:
• Tailor variable PDB CONNECT_SEQ
• Tailor subject CONSTRUCT_BACKBONE
• Tailor subject PROTEIN_SEARCH
Additional Information:
• Binary Protein Database: PRODAT on page 255
• Add Sidechains on page 125
Additional Information:
Nucleic Acid Modeling on page 328
Additional Information:
Nucleic Acid Modeling on page 328
Atomic charge sets are supplied only for ATP, ADP, GDP, and GTP and only
for the AMBER7 FF99 and AMBER95 force fields.
These charge values were obtained from: Meagher K.L., Redman L.T., Carlson
H.A.,“Development of polyphosphate parameters for use with the AMBER
force field.” J. Comput. Chem., 24:1016-1025 (2003)
Because the disconnected residues are likely to be too close, you may want to
repair and optimize the local geometry through minimization: see Minimizing a
Subset of Residues (in the Force Field Manual)
Additional Information:
Join Chains on page 157 to connect two residue chains
Atom1 and atom2 must be valid connection points for the biopolymer, and their
cap fragments will be automatically discarded if necessary.
If the two atoms are in different molecule areas, all atoms attached to atom2
will be merged into atom1’s molecule area before the specified atoms are
connected.
Additional Information:
• BIOPOLYMER LOOP or BIOPOLYMER TWEAK to model insertions and
deletions in proteins
• Break a Chain on page 156 to break an inter residue bond and append
cap fragments
The N of the first amino acid and the C of the last residue in the chain are
identified and connected to each other (see Join Chains on page 157).
Additional Information:
• MAXIMIN2 to optimize the geometry of the model (in the Force Field
Manual)
• BIOPOLYMER LOOP or BIOPOLYMER TWEAK to generate possible
conformations
• Disulfide Bridges on page 148
• Chain Termini Sets on page 131
An atom selection dialog prompts you for a hydroxyl or carbonyl oxygen, that
is, an oxygen with only a single bonded neighbor.
If the chosen atom(s) is not a valid attachment point, SYBYL will look for any
atom in the same residue that is valid.
Additional Information:
• Build Protein, DNA Strand, RNA Strand, Carbohydrate on page 142 to
build a specific (non-random) sequence
• List the complete sequence of a biopolymer
• A description of the syntax for residue sequences and conformation
specification in the SPL Manual
Sequence Selection
Initial Sidechain Select the source of the initial conformation for each
Position sidechain being added or mutated:
• SYBYL—The conformation of the matching
residue in the open dictionary.
• Lovell—The most probable rotamer in the Lovell
rotamer library or the one that results in the fewest
bumps with the rest of the molecule. (S.C. Lovell,
J.M. Word, J.S. Richardson and D.C. Richardson in
“The Penultimate Rotamer Library.” Proteins:
Structure Function and Genetics, 40, 389-408
(2000).
http://kinemage.biochem.duke.edu/databases/
rotamer.php)
Tailor variable BIOPOLYMER ROTAMER_DIRECTORY
adds user-defined rotamer libraries to this list. In partic-
ular, see Dunbrack Rotamer Library on page 189.
Scan Whether to attempt to remove steric interactions
between the added or mutated residues and the rest of
the molecule. Torsion angles in the new sidechains are
scanned, through a full 360°, for positions that relieve
bad steric interactions. Only one bond at a time is
altered. After a position is found, that bond is removed
from consideration. Scanning continues until all inter-
actions dependent upon these bonds are relieved or
until no progress is made from one iteration to the next.
Number of Incre- Number of angle steps used to rotate through 360°. The
ments (N) amount of rotation at each step is 360/N.
The default value is taken from Tailor variable SCAN
NUMBER_INCREMENTS
vdW Factor Constant scaling factor to apply to all van der Waals
radii.
The default value is taken from Tailor variable SCAN
VDW_SCALE.
Action Buttons
Apply and Cre- Apply the specified changes and store the resulting
ate New Model molecule in a new area. By default the new model is
stored in the next available molecule area. You may
also use a browser to designate an alternative destina-
tion.
Apply to Apply the modification to the current molecule.
Selected
Sequence
Additional Information:
File Format for Rotamer Libraries on page 188
Minimization Setup
In the Edit Protein Composition dialog, activate Minimize Edited Sequence
and press Setup.
This operation replaces entire residues, not just sidechains, thus the geometry of
the backbone could be altered drastically, particularly if a proline is involved.
The backbone conformation is taken from the dictionary. However, any
sidechain conformational angles in the residues will be retained, as far down the
sidechain as possible.
Mutating monomers is similar, but replaces only the sidechains. It is much more
efficient and is generally preferable as it maintains the backbone geometry of
the original sequence.
Additional Information:
A description of the syntax for residue sequences and conformation specifi-
cation in the SPL Manual
To mutate a residue involved in a disulfide bridge you must first delete the S-S
bond. To mutate a terminal residue attached to a blocking group you must first
remove the blocking group.
Additional Information:
• Replace Sequence on page 167 to replace entire residues (including
backbone)
• A description of the syntax for residue sequences and conformation
specification in the SPL Manual
Then:
• Select the residue after which the new residue(s) will be inserted.
• In the Insert Biopolymer Sequence dialog (very similar to the Build
Biopolymer dialog) select the desired residue(s).
• The Adjust Geometry check box determines whether to move the end
of the chain to retain proper bond geometry, if possible.
• From the Conformation pull-down select a conformation state for the
backbone of the inserted residue(s). Sidechain conformations are taken
from the open dictionary.
• Press Angles to access a dialog where you can specify the value of
specific conformational angles.
• Tailor variable BIOPOLYMER BUILD_HYDROGENS determines if or what
hydrogens are added to the new residues.
Additional Information:
A description of the syntax for residue sequences and conformation specifi-
cation in the SPL Manual
Additional Information:
• Delete Monomers on page 172 to delete residues without closing the
gaps
• A description of the syntax for residue sequences and conformation
specification in the SPL Manual
Cap fragments, as defined in the open dictionary, are added to the residues
adjacent to the gap(s). Tailor variable BIOPOLYMER BUILD_HYDROGENS deter-
mines if or what hydrogens are added to the cap atoms.
Deleting residues in the middle of a sequence results in multiple chains with the
same name. It is then recommended to
• Give all chains a unique name: see Set Chain Names on page 132.
• Update the termini set membership: see Chain Termini Sets on page 131.
Additional Information:
• Excise Monomers on page 171 to delete residues and close the gap in the
backbone
• A description of the syntax for residue sequences and conformation
specification in the SPL Manual
Remarks:
The omega torsion values reported by this functionality and by ProTable are
misaligned by one peptide.
• Biopolymer does not have an omega value associated with the first
peptide in a chain, but there is an assignment for the last peptide.
Omega is defined by 4-tuplet CA(i-1)-C(i-1)-N(i)-CA(i) in residue i.
Thus measurements can start only with the second residue in the chain.
• ProTable has an omega assignment for the first peptide in a chain, but
not for the last peptide:
Omega is defined by 4-tuplet CA(i)-C(i)-N(i+1)-CA(i+1) in residue i.
UIMS2 Variable:
BIO_MEASURE_VALUE = the value of the last angle measured by BIOPOLYMER
MEASURE.
Additional Information:
• In the Biopolymer Manual:
• Find Secondary Structure Conformation on page 177 to identify
sequences of designated conformational state
• Ramachandran Graphs on page 91 to graph conformational angle
values
• Check Biopolymer Geometry on page 135 to report deviations from
standard geometry
• In the SYBYL Basics Manual:
• Built-in set FINDCONF
• A description of the syntax for residue sequences and conformation
specification in the SPL Manual
Conformational Activate the check box then select from the list of con-
State formational states in the current dictionary.
Angle Name Enter the angle value (in degrees) for any of the back-
bone conformational angles. These fields are active
only if the Conformational State check box is off.
Set Apply the defined angles to the selected residues.
Some torsion angles may need to be modified with the SCAN command.
Additional Information:
• Set Backbone Conformation Via the Menubar on page 175
• Set Sidechain Conformation Via the Menubar on page 185
• A description of the syntax for residue sequences and conformation
specification in the SPL Manual
Additional Information:
• Measure Conformation on page 174 to measure conformational angles in
the biopolymer
• Built-in set FINDCONF (in the SYBYL Basics Manual)
• A description of the syntax for residue sequences and conformation
specification in the SPL Manual
The names of the substructure sets consist of the conformational state followed
by an ID number or a chain identifier, followed by _KS for the Kabsch-Sander
method or _DICT for the dictionary method. If the molecule already has
secondary structure sets by the same name as those created by this command,
they are deleted and regenerated.
The FINDCONF method compares Phi and Psi angles in the protein to those
stored in the dictionary for the following conformational states: alpha_helix,
beta_sheet, and turnI. The set names created using this method for 1crn.pdb
are:
ALPHA_HELIX_1_DICT A/ILE7 A/CYS16 IVARSNFNVC
ALPHA_HELIX_2_DICT A/GLU23 A/TYR29 EAICATY
ALPHA_HELIX_A_DICT
Files for READ_FROM_FILE can be either PIR or FASTA formatted files (see
Read PIR and FASTA Files on page 67), or a file with the following format:
• first line = number of residues,
• following lines (80 chars/line) = one letter residue codes.
You may use your own prediction program (USER_SPECIFIED option). The file
interface requirements for your program are described in Secondary Structure
Prediction on page 319.
Files Created:
• .pred = file containing the predicted sequence
• .prob = file containing the probabilities of each state
Additional Information:
• See Secondary Structure Prediction on page 319 for more details on
these methods
• See Secondary Structure Prediction Files on page 298 for the format of
the input and output files
See also the File Format for Rotamer Libraries on page 188.
Residue Selection
Backbone
Phi, Psi, Omega The values of the backbone angles for the selected resi-
due.
Sidechain
Action Buttons
Dashed lines display bumps between the selected residue’s sidechain and other
atoms in the molecule area, including water molecules (see distance monitoring
in the Graphics Manual).
Delta Energy Reports, for the residue selected at the top of the dialog,
the difference between the energy of the applied rota-
mer and the energy of the Initial conformer. Energy
values are computed using the Tripos force field.
Use the check box to toggle off this feature.
Scan Sidechain
Number of Incre- Number of angle steps used to rotate through 360°. The
ments (N) amount of rotation at each step is 360/N.
The default value is taken from Tailor variable SCAN
NUMBER_INCREMENTS
vdW Factor Constant scaling factor to apply to all van der Waals
radii.
The default value is taken from Tailor variable SCAN
VDW_SCALE.
Scan Selected Attempt to remove steric interactions between the
Residue selected residue and the rest of the molecule. Torsion
angles in the residue’s sidechain are scanned, through a
full 360°, for positions that relieve bad steric interac-
tions. Only one bond at a time is altered. After a posi-
tion is found, that bond is removed from consideration.
Scanning continues until all interactions dependent
upon these bonds are relieved or until no progress is
made from one iteration to the next.
Minimization
Minimize Access the Minimize dialog (see the Force Field Manual
Selected Resi- for details) to optimize the geometry of the selected res-
due idue.
Additional Information:
• Set Backbone Conformation Via the Menubar on page 175
• Set Biopolymer Conformation via the Command Line on page 175
The SCAN command is executed on all rotatable bonds in the selected residues’
sidechains with a fixed angle increment of 3° so that minimal changes from
starting geometries will be made. The “hardness” of the vdW spheres can be
adjusted using Tailor variable SCAN VDW_SCALE.
In the case of large proteins, if the whole molecule is selected this operation can
be time consuming.
Additional Information:
Built-in sets (in the SYBYL Basics Manual)
Note that only conformational angles defined in the dictionary can be copied
with this command.
When the copying operation encounters a proline in the target sequence the
option of distorting the proline geometry to match the conformation found in
the source sequence is determined by the status of Tailor variable BIOPOLYMER
PROLINE_GEOMETRY.
Loop Fragment:
When the loop search is complete, SYBYL writes a file (filename.loop)
containing the parameters of the loop search and the loop fragments. For each
loop fragment, the following information is stored:
• source of the fragment;
• amino acid sequence;
• sequence homology score of the window region, using the homology
matrix specified with Tailor variable BIOPOLYMER
SIMILARITY_MATRIX;
• RMS fit to anchor regions;
• coordinates of the backbone atoms in the loop, transformed to fit the
reference molecule.
Normally only backbone atoms are produced. Sidechains can be added later (see
Add Sidechains on page 125).
Once the loop search is completed, you can analyze the results immediately (see
Loop Search Results in a Spreadsheet on page 201 for a description of available
analysis tools). Since results are saved in a file, you can also analyze the results
Homology Score
The homology score is calculated as follows:
• First the current homology matrix is examined to determine the
similarity between each residue in the two sequences.
• These scores are summed over the window region (excluding the anchor
region). This is the “target vs. database fragment score”.
• The similarity score of the target sequence vs. itself is calculated (the
“target vs. target score”). The final score is then calculated.
TargetVsTargetScore
IdentityScore = ---------------------------------------------------- – MeanOfHomologyMatrix
WindowLength
TargetVsDatabaseFragmentScore
LoopScore = ---------------------------------------------------------------------------------- – MeanOfHomologyMatrix
WindowLength
LoopScore
FinalScore = 100 --------------------------------
IdentityScore
References:
[3] Jones, T. A. and Thirup, S. (1986) EMBO Journal, 5:4, 819-822.
[4] Claessens, M.; Van Cutsem, E.; Lasters, I.; Wodak, S. Protein
Engineering (1989) 2:5, 335-345.
Additional Information:
• Biopolymer Tweak for a computational method for generating loop
conformations
• Excise Monomers on page 171 to delete residues from a chain and join
the adjacent residues to close the gap in the backbone
• Insert Monomers on page 169 to insert residue(s) in a chain
• Join Chains on page 157
• Protein Loop Searching on page 320 for a detailed description of the
methods used in loop searching
• Biopolymer Loop Files on page 294 for a description of the file format
• Binary Protein Database: PRODAT on page 255 and its graphical user
interface on page 256
The protein database is searched for fragments of the indicated length that fit
well between the 2 flanking residues. This may be used to model local changes
in the protein’s conformation introduced by the insertion or deletion of residues.
For the remainder of this discussion, the residues between the two flanking
residues you specify are called the window region of the loop. The two regions
containing the flanking residues are called the anchor regions of the loop. The
total number of residues in the loop fragments found by this procedure will be
the number of residues in the window region plus the number of residues in the
2 anchor regions. As a special case, you can search for terminal loops by speci-
fying a non-existent preceding_res or following_ res. In this case, there is only a
single anchor region.
You can perform a loop search on any protein molecule in SYBYL. While the
anchor regions must exist in the protein, the window region can either be
already present in the molecule (with any number of residues) or missing; in
either case, the biopolymer loop analysis will fill in the entire loop region with
the proper residues.
Tailor Variables:
• Tailor subject PROTEIN_LOOP
• Tailor subject PROTEIN_SEARCH
Additional Information:
• Biopolymer Tweak to perform an ab initio loop generation
• See Protein Folding And Model Generation on page 319 for information
on the method used to derive loop conformations
Biopolymer Tweak and Biopolymer PRODAT Search use the same definitions
for window region and anchor region with the exception that a Tweak anchor
region always consists of a single residue. Thus the total number of residues in
the loop fragments found by Tweak will be the number of residues in the
window region plus 2 (one for each anchor region).
When the loop search is complete, SYBYL writes a file (extension is .loop)
containing the parameters of the loop search and the loop fragments. For each
loop fragment, the following information is stored:
• the source of the fragment,
• amino acid sequence,
• RMS fit to anchor regions,
Note that the source of the fragment is always TWEAK_XX (where XX is the
loop number, the amino acid sequence for all Biopolymer Tweak-generated
loops is constant per run, and the RMS fit to the anchor regions will always be
very close to zero.
Only backbone atoms are supplied. You can add sidechains later (see Add
Sidechains on page 125).
Method:
Biopolymer Tweak initially defines four distance constraints and their target
values between the CA/N atoms of one anchor region and the CA/C atoms of
the other anchor region.
The distance constraints are measured and a difference vector between the
actual and target distance constraints is computed, along with a matrix
containing the derivatives of each distance constraint with respect to each
torsion angle. A set of optimal corrections to the torsion angles is calculated
from a 4x4 linear system defined by the difference vector and the derivative
matrix. Optimal corrections are then limited in magnitude by Tailor variable
TWEAK MAX_TORSIONAL_CHANGE. The final torsional corrections are applied
to the fragment to give a new set of atomic coordinates. This process is repeated
until either the number of iterations is exceeded (Tailor variable TWEAK
MAX_ITERATIONS) and the fragment is rejected, or the magnitude of the
difference vector is less than the value of Tailor variable TWEAK
TARGET_DISTANCE_TOLERANCE and the fragment is subjected to chirality tests
and optional bump checking. Loops rejected for exceeding the iteration limit are
written to the terminal with the symbol d.
Loop chirality is checked against the chirality defined in the anchor regions and
loops failing this test are rejected and written to the terminal with the symbol c.
If a loop fragment has passed all its screening tests it is finally accepted, fitted
to the original anchor region, and written to the file filename.loop for
upcoming analysis (see Loop Search Results in a Spreadsheet on page 201).
Accepted fragments are written to the terminal with the symbol + followed by a
line terminator.
This entire method is repeated until the number of accepted loop fragments
equals that specified by Tailor variable TWEAK NLOOPS. The resulting loop
fragments are then passed to the analysis functionality.
UIMS2 Variables:
• MNDL_TWEAK_SEED = the user-supplied seed for the random number
used by BIOPOLYMER TWEAK.
• BIO_LOOP_NLOOPS = the number of loop conformations being analyzed.
Additional Information:
• Biopolymer PRODAT Search for more information on loop generation
by database searching
• Excise Monomers on page 171 to delete residues from a chain and join
the adjacent residues to close the gap in the backbone
• Insert Monomers on page 169 to insert one or more residue in a chain
• Join Chains on page 157
• See Random Tweak Loop Generation on page 322 for more detail on the
tweak algorithm
Access:
1. The protein for which the loop search was performed must be present.
2. Biopolymer > Protein Loops > Analyze Search Results and retrieve
a .loop file.
3. The loop candidates are loaded in a spreadsheet within the molecular data
explorer.
Upon opening a biopolymer loop spreadsheet the loop in the first row is
automatically melded into the protein.
Biopolymer Menu
MDE: Biopolymer
• Examine Selected Loops
• Display All Loops
• Color Loop
table_name Name of the data table to use in the analysis. The next
two arguments will be skipped if a table with this name
is already open.
mol_area Molecule on which a loop search was previously run.
loop_file File containing the loop results (default file extension:
.loop)
At the conclusion of LOOP ANALYZE, the molecule will be left with the selected
loop fragment inserted. You can recover the original (pre-loop search) molecule
by reading in the Mol2 file associated with the BIOPOLYMER LOOP run name.
In addition to the data directly available for each loop fragment (atomic coordi-
nates, source of the fragment, residue sequence of the fragment, RMS deviation
of the least-squares fit to the anchor region), you can derive additional infor-
mation to help in selecting from among the candidate fragments.
In addition to inserting the selected loop in the molecule, this command will
print its source and original amino acid sequence. It prompts repeatedly until
you press the end-loop character (|) or abort character (^).
Remarks:
Use of this data provides an easy way of weeding out some unreasonable
conformations.
Additional Information:
Tailor subject GENERAL for additional bump monitoring adjustments
Tailor Variables:
Tailor variable BIOPOLYMER SIMILARITY_MATRIX
After exiting from BIOPOLYMER LOOP ANALYZE, you can return and continue
analyzing the same table any time during the SYBYL session. If you have
added columns to your table, you can save it using TABLE SAVE. If you have
not added any columns, there is no need to save the table, since it can be
immediately recreated from the loop search results.
Additional Information:
• The alignment procedure uses the Needleman and Wunsch algorithm
(J. Mol. Biol. 1970, 48, 443) for the pairwise alignment of sequences
• Read and Write PIR and FASTA Files on page 67 for a description of
.pir formatted files and how to create them
• Sequence Alignment on page 311 for a discussion of sequence alignment
methods and implementations
Similarity Matrix Select the type of similarity matrix: apg, greer, iden-
tity, mutation, physprop, pmutation (default),
swiss, or swiss2. You may also define your own sim-
ilarity matrix and use the file browser to specify its
location.
Tailor variable BIOPOLYMER SIMILARITY_MATRIX
Gap penalty Enter a positive number used to penalize gaps in
aligned sequences. Large values discourage insertion of
gaps in the alignment. The default gap penalty is auto-
matically adjusted when you select another Similarity
Matrix.
Tailor variable BIOPOLYMER GAP_PENALTY
Output Format Specified the format of the output file(s): MSF, PIR or
FASTA.
Output Name Enter a base name for the output files (do not provide
an extension). A multiple sequence format file with the
extension .msf will contain the alignment.
Edit in Check this box if you intend to view and edit the
Sequence sequence alignment with the Sequence Viewer. The nec-
Viewer essary set of files will be created based on the Run
name. See View/Edit Alignments on page 212.
Information about the alignment (PID, confidence, etc.) is listed in the console.
If more than two sequences are aligned, the pairwise sequence identity matrix is
also presented.
11.1.2 Align Sequences via the Command Line & Write MSA
Find an optimal alignment between two or more sequences and produce a
multiple sequence alignment (MSA) file.
pir1, pir2, ... Names of the files in .pir format (on letter residue
strings) containing the sequences of the molecules. At
least two sequences must be provided; end the list with
the end loop character (|). By default a maximum of 40
sequences can be aligned.
Tailor variable BIOPOLYMER MAX_SEQUENCES
run_name Base name for the output files (do not provide an exten-
sion). A multiple sequence format file by the name of
run_name.msf will contain the alignment.
Tailor Variables:
• Tailor variable BIOPOLYMER GAP_PENALTY
• Tailor variable BIOPOLYMER SIMILARITY_MATRIX
Additional Information:
• Read and Write PIR and FASTA Files on page 67 for a description of
.pir formatted files and how to create them
• Sequence Alignment on page 311 for a discussion of sequence alignment
methods and implementations
Tailor Variables:
• Tailor variable BIOPOLYMER GAP_PENALTY
• Tailor variable BIOPOLYMER IDENT_MODE
Additional Information:
• Read and Write PIR and FASTA Files on page 67 for a description of
.pir formatted files and how to create them
• Sequence Alignment on page 311 for a discussion of sequence alignment
methods and implementations
Multiple A chains are occasionally found in PDB files. Listing the biopolymer’s
sequence will reveal that. We recommend that you:
• give all chains unique names: Set Chain Names on page 132
• edit the chain termini: Chain Termini Sets on page 131
Additional Information:
LIST SEQUENCE to list partial sequences
The atom_expr argument allows you to specify a subset of the sequence atoms
to be used in the fit (e.g. only backbone atoms). The sequences to fit must have
the same length.
UIMS2 Variable:
• FIT_RMS = the RMS deviation computed by the least squares fit.
Additional Information:
• Find and Fit Fixed Regions on page 223 to perform a fit for multiple
conformations of a biopolymer
• A description of the syntax for residue sequences and conformation
specification in the SPL Manual
• Fit Atoms to perform a least squares fit between any atoms
• Match Atoms to automatically find a match and fit 2 molecules
Fixed Reference Select the reference molecule. It will remain fixed dur-
Structure ing the alignment process.
Movable Struc- Select one, several or all other molecules to be fitted to
ture(s) the reference molecule.
Buttons to assist in the selection of movable proteins.
One protein must be selected in the list for the action
buttons at the bottom of the dialog to be active.
Atoms to Use Specify the type of atoms to be used for the least-
for Fit squares fit: C-Alpha, Backbone, Sidechain or All.
Action Buttons
Without an MSA:
• %ID—% of identical residues; gaps in the alignment are ignored
• Score—The sum of the homology scores for the two sequences
• Sig.—Significance score (read about Jumbling and Significance on page
315); affected by Tailor variable BIOPOLYMER NUMBER_JUMBLES
• RMSD—Root Mean Square Distance for the specified types of atoms
With an MSA:
• Identity—% of identical residues; gaps in the alignment are ignored
• Score—The sum of the homology scores for all the sequences
Note: Sequence alignment cannot be used reliably for structural alignment if:
• The sequence identity is lower than 30%.
• The significance score is lower than a 4.
Database Name Select one of the opened databases or open a new one.
• Open—This option brings up a file browser.
• xxx.mdb—If you already have a database open, its
name will appear here.
Fitting Options Activate one or more of the following fitting options.
There will be at least one column in the resulting
ProTable spreadsheet for each of the selections made
here.
• Backbone Atoms—backbone atoms, as defined
in the dictionary
• Heavy Atoms—all non-hydrogen atoms
• All Atoms—all atoms
• Alpha Carbons—alpha carbons only
Segment Length Activate one of the following mutually exclusive
options. This selection is combined with the Fitting
Options, and the results are reported in separate col-
umns in a ProTable spreadsheet.
• One Residue—calculates one residue fits
• Three Residues—calculates three residue fits.
Only three-residue fits are reported for Alpha
Carbons.
• Report Both—reports the results of both one
residue and three residue fits
Because the application will be loading all database molecules into SYBYL, it
will first delete (zap) all molecules already present. A warning message is
issued before anything is done, giving you an option to exit.
Additional Information:
• The ProTable Manual
• Find and Fit Fixed Regions on page 223
The molecules must be of identical sequence and number of atoms and they
must contain all hydrogens and lone pairs. A minimum of 3 molecules is needed
to perform the fit.
where:
• Func = RMS deviation or distance variance
• Func = the mean value of the function over all residues in the fixed
region
• u = a scaling constant
The u is set to 1.5 at the start of the command. If this u value eliminates more
than 3 residues in the first iteration, than u is reset to a value such that it elimi-
nates only 3 residues from the fixed region.
UIMS2 Variables:
SUBST_FIXED contains the fixed region identified by the BIOPOLYMER
RESIDUE_FIT command. The SUBST_FIXED variable is a space separated list
of substructure names identified as in the fixed region of the molecule. Please
note that this variable is not defined until the first execution of the command.
Reference:
“A simple method for delineating well-defined and variable regions in protein
structures determined from inter-proton distance data,” M. Nilges, G.M. Clore
and A.M. Gronenborn, FEBS, 219, 1, 11-16, (1987).
Additional Information:
Fit Monomers on page 216 to perform a least squares fit between two
biopolymer sequences of identical lengths
The Protein Data Bank contains a wealth of structural and sequence infor-
mation. SYBYL includes PRODAT, a binary compilation of the better struc-
tures from the PDB, for use in protein loop searches. Search results can be
retrieved as molecular fragments, PDB source information, or ID indices from
the binary database itself.
References:
[1] F.C. Bernstein, T.F. Koetzle, G.J.B. Williams, E.F. Meyer, Jr., M.D.
Brice, J.R. Rodgers, O. Kennard, T. Shimanouchi, and M. Tasumi, J.
Mol. Biol., 1977, 112, 535-42.
[2] E.E. Abola, F. C. Bernstein, S.H. Bryant, T.F. Koetzle, and J. Weng in
Crystallographic Databases – Information Content, Software Systems,
Scientific Applications, eds. F. H. Allen, G. Bergerhoff, and R. Sievers,
Data Commission of the International Union of Crystallographs, Bonn/
Cambridge/Chester, 107-132 (1987).
Additional Information:
• Search PRODAT Database for Loops on page 196 for replacing or
building loop fragments in proteins
• Tailor variable PROTEIN_SEARCH to specify preferences
• Loop Search Results in a Spreadsheet on page 201 for column types and
options
• See page 255 for a description of mkprodat and the binary database
Sequence The fixed length of the loop to which the distance con-
Length straint applies.
Inter-CA Dis- The lower and upper bounds of the distance constraint
tances on the loop.
Residue Offset Type in an integer. Positive and negative offsets are
allowed. That is, an offset of -3 will return the residue
ID of the residue 3 positions toward the N-terminus
from the residue that matches the query. This option is
useful in boolean operations for building complex
expressions.
Press the Class Expression button on the Protein Database Searching dialog.
Protein Search The query from the Protein Database Searching dialog
Query which generated these results.
Number of Frag- Numerical field indicating how many fragments in the
ments database match the search query. If the search results
are empty, you will be returned to the main dialog
instead.
List Fragment Press this button if the desired output is one line of text
Source Informa- per fragment which gives the location of the fragment
tion in the corresponding PDB file, including the name of
the source protein.
Retrieve Molecu- Fragments that match the search query will be retrieved
lar Fragments from the binary protein database and displayed in mole-
cule areas.
Fileset Name Enter the base name for the search result files.
Perform RMS Fit Performs a least squares fit on the retrieved fragments
of Fragments (see Fit Monomers on page 216). This option is avail-
able only if all the fragments have the same length. The
RMS fit is done on the alpha carbons of each residue in
the fragment. The RMS fit values are stored in a .rms
file.
Additional Information:
Loop Search Results in a Spreadsheet on page 201
Dictionary files have a file extension of .dic, and reside in the directory
specified by Tailor variable BIOPOLYMER DIRECTORY.
The dictionaries currently provided are macromol, protein, bigpro, dna, rna,
sugar. The macromol dictionary is opened by default for all biopolymer opera-
tions unless another dictionary has been opened by the user.
Only one biopolymer dictionary can be open at a time. When a new dictionary
is opened, any previously opened dictionary is automatically closed. If the full
filename of the dictionary is the same as the one currently open, no action is
taken.
To build mixed complexes start by opening the macromol dictionary, build the
pieces in separate work areas then join (JOIN) or merge them (Edit > Merge).
The resulting molecule can be written as a .mol2 file. When this file is read
back into SYBYL, the appropriate dictionary (macromol) will be opened
automatically.
Additional Information:
Build Biopolymer on page 141 to build a biopolymer chain from information in
the dictionary
Current Dictio- Reports the full path of the directory containing the dic-
nary Directory tionary that is currently in memory.
The default location is determined by Tailor variable
BIOPOLYMER DIRECTORY.
Current Dictio- Reports the name of the dictionary that is currently in
nary memory.
The default dictionary is determined by Tailor variable
BIOPOLYMER DEFAULT_DICT.
Dictionary Management
Set Custom Specify the full path of the directory containing the dic-
Dictionary Direc- tionary of interest.
tory This operation:
• changes the location of the dictionary directory and
sets the value of the command Tailor variable
BIOPOLYMER DIRECTORY;
• closes the current dictionary;
• opens the dictionary if one is found in the new
location by the same name as the one that was open;
otherwise, prompts for the name of the dictionary to
open.
Note: A copy of the SYBYL 7.3 version of the dictio-
nary is available in $TA_ROOT/biopolymer/tables/
dictionary_73.
Use Default This operation:
Dictionary Direc- • resets the dictionary directory to the location deter-
tory mined by Tailor variable BIOPOLYMER
DIRECTORY,
• closes the dictionary that was open;
• opens the dictionary if one is found by the same
name as the one that was open; otherwise, prompts
for the name of the dictionary to open.
Create Custom This is the recommended first step before making any
Dictionary Direc- kind of modification to a dictionary.
tory Copy the contents of the current dictionary directory to
the specified path, which must be new or empty. The
Current Dictionary Directory at the top of the dialog
is changed automatically to this new location.
Change Dictio- Select a dictionary among those available in the direc-
nary tory and open it. The Current Dictionary at the top of
the dialog reflects the selection.
Save Dictionary Create a permanent (disk file) dictionary from the tem-
As porary (in-memory) dictionary. The dictionary name
can be either an existing dictionary, or a new one. The
default extension is .dic.
The dictionary will be created in the directory shown at
the top of the dialog.
Note: This operation leaves the original dictionary
open.
The corresponding command is:
BIOPOLYMER DICTIONARY CREATE DICTIONARY
dict_name
Monomer Management
Create New Access the Create Monomer dialog. The new monomer
Monomer must already be present in a molecule area, in the neu-
tral, unblocked form.
Change Exist- Select a monomer already defined in the dictionary in
ing Monomer memory and access the Create Monomer dialog.
Create AMBER Access the Monomer SLN Atom Typing Rules dialog.
SLN Typing You will be prompted to specify the type of monomer
Rules (PROTEIN, NUCLEIC_ACID, or OTHER) then to
select an existing monomer.
Create AMBER Access the Monomer SLN Atom Typing Rules dialog.
SLN Typing The molecule of interest must already be in a molecule
Rules area.
From Molecule This option is useful if the molecule of interest is not a
defined monomer. This would be the case for a ligand.
Preprocessing of this molecule with the tools in the
Biopolymer > Prepare Structure menu is recom-
mended, but not required.
Add Monomer Add the specified monomer to the dictionary currently
File in memory. Duplicate entries are not allowed.
to Dictionary The corresponding command is:
BIOPOLYMER DICTIONARY ADD MONOMER file-
name
The monomer is then immediately available for use
while the current dictionary is open. To include the
monomer permanently in a dictionary use Save Dic-
tionary As in this dialog.
Additional Information:
Define a New Blocking Group on page 252
You will be asked to select a monomer from the dictionary in memory. The
structure in the corresponding .res file will be read into the first available
molecule area. Some options in the dialog will be unavailable because they
cannot be changed once the initial .res file has been created.
Monomer Features
When you press the OK button, the new or modified monomer definition is
stored in the specified .res file.
Interactive Spreadsheet
A spreadsheet makes it easy to inspect and edit atom parameters. The rows are
the atom ID numbers as rows and the following are columns:
• NAME—atom name
• A02_T—AMBER7 F02 atom type
• A99_T—AMBER7 FF99 atom type
Additional Information:
Description of the Biopolymer Dictionaries on page 278
First build the monomer in a work area by itself, using any SYBYL commands.
You may find it easiest to start from an existing monomer then modify it. Add
all hydrogens to the model and ensure that all atom names are correct. Each
atom in the monomer must have a unique name. You should be consistent about
the way atoms are named across different residues in the same dictionary,
including cap atoms, particularly to allow proper operation of conformational
definitions and blocking groups. Make sure that the dictionary in which this
monomer will be included is open.
A .res file containing the new monomer is created. The new monomer is
automatically added to the dictionary currently in memory and is, therefore,
available for immediate use. If you want this monomer to be permanently
included in the dictionary see Create or Update a Dictionary on page 248.
Additional Information:
• Define Monomer via the Menubar on page 241
• See Biopolymer Dictionaries on page 278 for a description of the
biopolymer dictionary file format
The structure in the corresponding .res file will be read into the first available
molecule area. The attachment atoms will be shown in magenta. Their atom
types and charges cannot be changed.
For a Molecule:
A molecule must be present in a molecule area.
This option is useful if the molecule of interest is not a defined monomer. This
would be the case for a ligand. Preprocessing of this molecule with the tools in
the Biopolymer > Prepare Structure menu is recommended, but not
required.
First build the blocking group in a molecule area by itself, using any SYBYL
commands. Add all hydrogens to the molecule and ensure that all atom names
are correct. Each atom in the blocking group must have a unique name.
If you are just making a minor change to an existing blocking group, you will
be able to take the default values for most of the following prompts:
Additional Information:
• See Biopolymer Dictionaries on page 278 for a description of the
biopolymer dictionary file format
A list of the structures used to build the Tripos-supplied binary protein database
can be found in $TA_PDBTABLES/codeset (by default, $TA_ROOT/
biopolymer/tables/prodat/codeset).
The SYBYL software looks for PRODAT in the directory pointed to by the
environment variable TA_PDBTABLES. By default, this variable is set in
$TA_ROOT/lib/environment to $TA_ROOT/biopolymer/tables/prodat.
Ligands, cofactors, metals, and water molecules are discarded from the selected
entries.
Warning: We recommend that you modify a copy of the original binary protein
database then reset your local environment to point to the copy (see Location
and Environment).
Additional Information:
• mkprodat Utility on page 258
• Read about the file of file names
Warning: We recommend that you modify a copy of the original binary protein
database then reset your local environment to point to the copy (see Location
and Environment).
Syntax:
mkprodat determines the PDB code for a PDB file in one of 2 ways. First, it
looks in columns 63-66 of the HEADER record of the file. If these columns are
blank or if there is no HEADER record, mkprodat uses the file name given in
file_of_filenames, if it is 4 characters long. If both of these methods fail,
mkprodat assigns an arbitrary PDB code and issues a warning message.
mkprodat creates many binary data files, plus the text file codeset. The
codeset file contains the name of each molecule in the database, with a “+” or
“-” to indicate whether it will be used or ignored in database and loop searches.
You can feel free to modify or copy this file to control which proteins are used
in searches. Tailor variable PROTEIN_SEARCH PDBCODE_SET controls which
codeset file is used in SYBYL.
Note: If the HEADER line in the PDB file is shorter than 66 characters and not
padded on the right side with blanks, mkprodat will not produce a PDB
refcode when adding the protein to the binary database. You may choose either
of the following solutions:
• Edit the PDB file and insert the desired PDB code in columns 63-66 of
the HEADER line.
• Change the file of file names used by mkprodat to specify 4-character
file names and change the script $TA_ROOT/bin/unix/pdbfname to
locate the exact file from the 4-character code.
Additional Information:
• Customize PRODAT via the Menubar on page 256
14.6.4 pdbfname
If a local copy of the Protein Data Bank is maintained at your site ask your
systems administrator to set $TA_PDB in the $TA_ROOT/lib/environment
file within your SYBYL-X installation to point to it. Refer to Environment
Variables in your SYBYL Installation in the SYBYL Administration Manual.
Within SYBYL, this script is accessed by mkprodat. You may also use it to
read a PDB file via a command.
For example, to retrieve the PDB file with the code name of 1crn from your
site’s copy of the PDB database, enter the command.
PDB IN m1 %system("pdbfname 1crn")
Protein structures may be read in through the Sequence Viewer and displayed on
the SYBYL screen which can then be annotated and colored by manipulating
the corresponding sequence in the viewer. Three dimensional properties such as
solvent accessibility, % accessibility, residues involved in a certain type of
H-bonding, mapping phi and psi information, among others can all be
performed within the viewer.
In this chapter:
• Description of the Sequence Viewer on page 262
• Mouse and Keyboard Interactions in the Sequence Viewer on page 275
Biopolymer mode:
Biopolymer > Sequence Viewer
Or
Click on the Biopolymer toolbar.
Or
Open a sequence file via File > Import File ( )
ORCHESTRAR mode:
It is posted automatically by the following ORCHESTRAR dialogs:
• Model Conserved Regions
• Analyze Conserved Regions
• Search Loops
• Add/Analyze Loops
• Model Sidechains
• Analyze Sidechains
• Analyze Model
Edit Menu
Accessible if the Sequence Viewer is used in Biopolymer mode or if it is posted
by ORCHESTRAR’s Model Conserved Regions dialog.
Undo Sequence Restore the most recent sequence alignment from the
Change(s) backup stack. The stack contains up to 20 sequence
alignments. Each new alignment, created via the mouse
or an alignment function, is automatically added to the
stack. The original alignment is preserved as the oldest
item in the stack unless a File > Associate Molecule
was performed, which resets the stack.
Remove Delete the selected sequence(s) from the Sequence
Selected Viewer. However, any corresponding structure(s) will
Sequence(s) remain on the SYBYL screen.
Note: It is recommended that the remaining sequences
be realigned using Align Selected Sequences on
this menu.
Remove All Delete all sequences from the Sequence Viewer,
whether selected or not.
Remove All Remove all gaps from all sequences.
Gaps
Remove Empty If a gap occurs at the same ruler position (a column) in
Columns all sequences, the gap is removed from all sequences,
and the sequences shifted to the left
Align Selected Align the selected sequences using the Needleman and
Sequences Wunsch algorithm [Ref. 28]. If no sequences are
selected, all are used.
Note: The structures corresponding to the realigned
sequences are not affected by this operation.
Align Selected Align the structures associated with the selected (or all)
Structures sequences (whole or partial) based on the current
sequence alignment. The structural alignment is per-
formed on C-Alpha atoms.
Note: This functionality is not accessible from within
ORCHESTRAR.
View Menu
Text Style A radio button in the side menu identifies the style
applied to the text of the model and homolog
sequences:
• Plain Sequence—Back text.
• Color by Secondary Structure—Helices in red,
sheets in blue and turns in magenta. This is the
initial default style, as set by Tailor variable
BIOPOLYMER SEQUENCE_VIEWER
INITIAL_TEXT_STYLE.
• JOY Annotation—See JOY Annotation Key on
page 274.
Text Back- Select the color scheme to be applied to textual repre-
ground sentation of the sequences in the right panel.
See Color Schemes for the Text Background in the
Sequence Viewer on page 270.
Text Size Size of the text in the dialog: Tiny, Small, Normal,
Medium, Large, or Huge.
Consensus The consensus sequence is updated on the fly using all
Sequence homolog sequences as listed in the Sequence Viewer.
The number of identical residues (ni) is counted and
compared to the number of sequences (ns). For a given
ruler position, the consensus residue character is:
• Bold: ni/ns > 70%
• Normal: 70% > ni/ns > 35%
• Blank: ni/ns <= 35%
Consensus JOY Display a consensus JOY secondary structure sequence,
SS which is updated on the fly using all homolog
sequences. Prerequisite is to have read in a JOY tem-
plate file (via the File menu). The secondary structure
element is shown if it occurs in more than 70% of the
homolog sequences.
• a = alpha helix
• b = beta sheet
• 3 = 3/10 helix
Selection Menu
Options Menu
Help Menu
Access:
• At the bottom right of the Sequence Viewer click Color to post the Set
Color dialog.
• In the Sequence Viewer’s View menu, select Text Background.
You may then apply the color scheme to the corresponding molecule on the
SYBYL screen:
• At the bottom right of the Sequence Viewer click Color Molecule.
• In the Sequence Viewer’s View menu, select Color Molecule.
The color options below require a JOY template to be read in (via the File
menu).
The following table is taken from the online JOY manual: (http://www-
cryst.bioc.cam.ac.uk/joy/joyman.htm).
Inserting Gaps
Deleting Gaps
• In a single sequence
• In multiple sequences, but only to the extent that gap(s) can be deleted in
any of the individual sequences.
Moving Residues
Use the middle mouse button to move one or more residue(s). This may be done
in one of two modes.
Abacus mode: Ctrl (Command on the Mac) + click and drag the middle button.
• This mode allows you to move a single residue within a gap.
• None of the residues can be pushed beyond the ruler’s extreme positions.
Each atom in a residue is specified by a line which contains the name of the
atom, the atom ID number, the SYBYL atom type, and the XYZ coordinates of
that atom (in an arbitrary coordinate system). Each atom entry also contains a
coded status indicator and may have a nickname for use in defining conforma-
tional angles (see Dictionary Files on page 286). Atom names must be uniquely
specified for each atom in a residue and for consistency should be similar to the
atom names in the other residue files pertaining to the same dictionary. SYBYL/
Biopolymer residues have atom names which follow as closely as possible the
IUPAC-IUB nomenclature conventions [Ref. 11].
The second digit is 4 if the atom is on a direct backbone path from one end of
the residue to the other, and 0 otherwise. Cap atoms are atoms deleted when
connecting this residue to a subsequent or previous residue in a chain. Backbone
atoms are those atoms which would be part of a continuous chain from one end
of the biopolymer to the other (backbone atoms are automatically added to the
built-in set {BACKBONE}). Essential hydrogens or lone pairs are those connected
to polar atoms and are likely to be involved in hydrogen bonds. When building
biopolymers you have three choices based on Tailor variable BIOPOLYMER
BUILD_HYDROGENS. Biopolymers can be built with no hydrogens, with only the
essential hydrogens, or with all possible hydrogens included. This is to allow
easy construction of biopolymer structures appropriate for a particular modeling
process.
Residue files contain explicit information on the connectivity within the residue.
Each bond in the molecule is described by a line specifying the two atom IDs
comprising the bond, and the bond type.
Various property values and sets are included in residue files. The two typical
properties included by Tripos are the molecular weight of the residue and its ∆G
or free energy of formation. Sets defined for residues include charge sets and
alternate atom types to be used with force fields other than the Tripos force
field.
File Format:
This section describes the format of the .res biopolymer dictionary files. Inden-
tation in the format description indicates that the indented section is repeated
the number of times specified on the preceding line. The only restriction on the
format of the files is that items must be separated by white space (one or more
spaces, tabs, or new lines). Refer to the Tripos-supplied dictionary files for
guidance in understanding the formats.
ATOM_NAME CHARGE
NALT_TYPE_SETS
TYPE_SET_NAME NATOMS
ATOM_NAME ATOM_TYPE
NCONN_BONDS
CONN_ATOM CAP_ATOM
NCONN_GROUPS
NGROUP_BONDS
CONN_ATOM CAP_ATOM
Legend:
Examples:
Examples of the first record line in the residue files for the standard amino acid
ALA and the blocking group BOC. Note the comma-separated lists for the
monomer class.
ALA alanine A amino_acid,standard,default
BOC N_Terminal_t-Butyloxycarbonyl . block,head,amino_acid
Additional Information:
• Dictionary Files
• User Creation of Dictionaries and Residues
ABU amino_acid,modified
AIB amino_acid,modified
ANY amino_acid,modified
ARZ amino_acid,modified
BAL amino_acid,modified
CYM amino_acid,modified
CYX amino_acid,modified
DA amino_acid,modified
HCX amino_acid,modified
HCY amino_acid,modified
HPR amino_acid,modified,backbone_ring
HSE amino_acid,modified
HYP amino_acid,modified,backbone_ring
MAL amino_acid,modified
MBT amino_acid,modified
NLE amino_acid,modified
NMA amino_acid,modified
NML amino_acid,modified
NMS amino_acid,modified
NMV amino_acid,modified
NVA amino_acid,modified
ORN amino_acid,modified
ORZ amino_acid,modified
PHG amino_acid,modified
PSE amino_acid,modified
PSM amino_acid,modified
PSZ amino_acid,modified
PTM amino_acid,modified
PTY amino_acid,modified
PTZ amino_acid,modified
SAR amino_acid,modified
dA dna
dC dna
dG dna
dT dna
rA rna
rC rna
rG rna
rU rna
DRA carbohydrate
DRB carbohydrate
FRA carbohydrate
FRB carbohydrate
GAA carbohydrate
GAB carbohydrate
GLA carbohydrate
GLB carbohydrate
MAA carbohydrate
MAB carbohydrate
RBA carbohydrate
RBB carbohydrate
HOH other,water
SPC other,water
TIP other,water
WAT other,water
WTR other,water
AMN block,head,amino_acid,special_blk,charged_blk
AMI block,head,amino_acid,special_blk,neutral_blk
NMT block,head,amino_acid,special_blk
ACE block,head,amino_acid
BOC block,head,amino_acid
FOR block,head,amino_acid
PYR block,head,amino_acid
CXL block,tail,amino_acid,special_blk,charged_blk
CXC block,tail,amino_acid,special_blk,neutral_blk
AMD block,tail,amino_acid,special_blk
CME block,tail,amino_acid
EES block,tail,amino_acid
MES block,tail,amino_acid
NME block,tail,amino_acid
NMM block,tail,amino_acid
HB block,head,dna,rna
HE block,tail,dna,rna
OME block,carbohydrate
The next type of information stored in biopolymer dictionary files is the number
and names of atoms used to create inter residue bonds. Entries also define the
number of possible connections for each residue and the bond types of the inter
residue bonds. Finally, particular conformational states can be chosen as the
default settings for building inter residue bonds.
The properties ascribed to residues must be defined within the dictionary file.
Entries describe the names and number of property sets, the names and number
of charge sets, and the names and number of alternate atom type sets given to
each residue. The most important property defined by the dictionary is a list of
all residue files available for modeling operations with this class of biopolymer.
Some biopolymer classes define torsion angles within rings. In order to allow
these angles to be set, ring closure bonds must be defined and a set of possible
values for all the conformational angles in the ring must be given. This is
because closure of a ring forces all the torsion angles in the ring to be interde-
pendent. SYBYL/Biopolymer handles this difficulty by defining one of the
torsion angles in the incipient ring as a master angle and the rest as dependent
on this master angle. This information is contained within the dictionary file.
The final entries in the dictionary files are used to create the built-in sets which
relate to biopolymers and are listed in the Expression dialogs (see the SYBYL
Basics Manual).
File Format:
This section describes the format of the .dic biopolymer dictionary files. Inden-
tation in the format description indicates that the indented section is repeated
the number of times specified on the preceding line. The only restriction on the
format of the files is that items must be separated by white space (one or more
spaces, tabs, or new lines). Refer to the Tripos-supplied dictionary files for
guidance in understanding the formats.
DICTTYPE MOLECULE_TYPE_CODE
NCONF_ANGLES
ANGLE_NAME ANGLE_TYPE_CODE NANGLE_ATOMS
ATOM_NAME MONOMER_OFFSET
NCONF_STATES
STATE_NAME NCONTIG NANGLES
ANGLE_NAME ANGLE_INTERP ANG_MONOMER_OFFSET
ANGLE_VALUE
DISCREPANCY
NCONNECT_ATOMS
ATOM_NAME MAX_BRANCHES NCAP_ATOMS NCAP_BONDS
NCONNECTIONS
ORIGIN_ATOM_NAME TARGET_ATOM_NAME BOND_TYPE
NENFORCE_STATES
STATE_NAME
NMONOMER_PROPS
PROP_NAME PROP_TYPE
NCHARGESETS
CHARGESET_NAME
NALT_TYPE_SETS
TYPE_SET_NAME ASSOC_CHARGESET
NMONOMERS
MON_FILE_NAME
NDEPENDENCE_STRUCTURES
MASTER_ANGLE_NAME
BREAK_ORIGIN_NAME MONOMER_OFFSET
BREAK_TARGET_NAME MONOMER_OFFSET
NDEPENDENT_ANGLES
ANGLE_NAME
NINTERP_VALUES
MASTER_ANGLE_VALUE DEP_ANGLE1_VALUE DEP_ANGLE2_VALUE
…
NGLOBAL_SETS
SET_NAME OBJECT_CLASS SET_DEFINITION
Legend:
Additional Information:
• Residue Files
• User Creation of Dictionaries and Residues
If you receive error messages when opening your own dictionary or residue file,
you will usually be given sufficient information to find and correct offending
lines. Residues are read in the order in which they are listed in dictionary files.
This means that the first occurrence of a particular three or one letter code read
by the program will take precedence over subsequent entries with duplicate
codes. This is significant when defining multi-class dictionaries which
recognize more than one type of biopolymer. For example, if you want to create
a combined DNA-protein dictionary, the entries for the DNA residues should
occur first in the list of residues. In this way, the sequence expression
A=C=G=T to be interpreted as a DNA sequence rather than ala=cys=gly=thr
which is what should be used if a protein sequence was intended. Since there
are only 26 possible one-letter codes, you are restricted to three letter names
when creating large dictionaries of residues.
Additional Information:
• Residue Files
• Dictionary Files
Ligand Database
The macromol dictionary includes a ligand database that is based on infor-
mation retrieved from the Ligand Depot site, a service associated with RCSB.
This database helps the SYBYL PDB reader assign correct atom and bond types
to most ligands. The atom set {LIGDB} contains all atoms that are typed using
the ligand database.
Additional Information:
• Residue Files
• Dictionary Files
• User Creation of Dictionaries and Residues
First line:
Number of amino acids (n)
Second line:
1-letter amino acid codes, with no spaces. Use the letter X for an amino acid
whose type is unknown.
Subsequent lines:
n x n matrix of floating point or integer scores. The entry in row i, column j
th
gives the score of changing from the ith to the j amino acid, as listed in the
second line of the file.
Sample file:
This is the file used for the ALA_PRO_GLY homology method, in which
changes to or from a PRO or GLY have a score of 0, other changes, 1.
21
ACDEFHIKLMNQRSTVWYPGX
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
The file format has the flavor of Tripos mol2 files, with relevant information
introduced by keywords. White space in the file is generally treated all the
same, with the following restrictions: keywords must be the first non-blank
word on a line, and there must be a single value following the keyword on the
same line.
The loop search results are introduced by a line whose contents is:
@<TRIPOS>LOOPSEARCH
16.3.1 Keywords
Anything in the file prior to this line will be ignored. Following this line are
various pieces of data introduced by keywords. Here is a complete list of the
possible keywords:
Keyword Description
#total_len# Number of residues in each fragment
#na1# Number of residues in the N anchor region (*)
#na2# Number of residues in the C anchor region (*)
#end_thresh# End-to-end distance threshold used in the search
(*)
#dist_thresh# Inter-CA distance threshold used in the search
(*)
#fit_thresh# RMS deviation threshold used in the search (*)
#min_nloops# Minimum # of loops specified to save from
search (*)
Note: Several of the keywords require more than one piece of data. In these
cases, the keyword is followed by a count of how many data items follow, and
then the individual data items, separated by white space. See the sample file
below for an illustration. The keywords that require more than one piece of data
are:
See Also:
• Secondary Structure Prediction on page 319 for a list of methods
provided in SYBYL
• Predict Secondary Structure on page 183 for a description of the
command syntax
Two files are written to your directory after each prediction. A prediction file
contains the conformation assignments and will be listed in SYBYL. A proba-
bility file contains the conformational probabilities for each residue.
http://www.compbio.ox.ac.uk/bioinformatics_faq/format_examples.shtml#fasta
DISPLAY
DISULFIDE Biopolymer > Build > Create Disulfide…
DNAHELIX Biopolymer > Build > DNA Double Helix...
ENDMODE Pick an item in another menu.
EXCISE Biopolymer > Composition > Excise Monomers…
FASTA File > Import File > Sequence
FIND
CONFORMATION
SEC_STR Biopolymer > Conformation > Find Secondary Structure...
FIT Biopolymer > Compare Structures > Fit Monomers…
FIX_ASN_GLN Biopolymer > Prepare Structure > Fix Sidechain Amides…
FIX_END_GROUPS Biopolymer > Prepare Structure > Fix End Groups…
FIX_MOLECULE
FIX_PROLINE Biopolymer > Prepare Structure > Fix Prolines…
FIX_SIDECHAINS Biopolymer > Conformation > Scan Sidechains Torsions…
FTP File > Retrieve PDB...
INSERT Biopolymer > Composition > Insert Monomers…
JOIN Biopolymer > Build > Join Chains…
LABEL_ATOMS
LOAD
CHARGES Biopolymer > Prepare Structure > Load Charges…
DEFINE_UNKSET
DEFINE_ZEROCHARGESET
DICT_CHARGES
DICT_TO_USER
DICT_TYPES
MINIMAL_USER_SET
OTHER_ATOM_TYPES Biopolymer > Prepare Structure > Assign AMBER Atom
Types
SLN_AUTO_CHARGES
SLN_AUTO_TYPES
LOOP Biopolymer > Protein Loops
ANALYZE Biopolymer > Protein Loops > Analyze Results…
SETUP Biopolymer > Protein Loops > Search PRODAT Data-
base…
MEASURE Biopolymer > Conformation > Measure…
When typing commands in the Command Console you can access the
biopolymer functions in either of two ways:
• Start each command with BIOPOLYMER
• Enter the BIOPOLYMER mode by issuing the command MODE
BIOPOLYMER. To exit this mode, type ENDMODE at the Biopolymer
prompt. While in BIOPOLYMER mode. You may execute a single top-
level SYBYL command by preceding it with the word COMMAND.
Note: As with all SYBYL commands, the BIOPOLYMER command can be abbre-
viated to a unique initial string, such as bio. However, we strongly recommend
that you always spell out command names in SPL scripts.
This chapter describes the theoretical background of some of the more scientifi-
cally complex commands within SYBYL. Discussions about the biopolymer
dictionaries, force field methods, and an overview of specific modeling applica-
tions are included.
• Introduction on page 308
• Protein Modeling on page 309
• Binary Protein Database
• Sequence Alignment
• Needleman-Wunsch
• Homology Matrices
• Gap Penalty
• Alignment Evaluation
• Protein Completion on page 316
• Backbone Construction
• Sidechain Addition
• Biopolymer End Group Modeling
• Secondary Structure Prediction
• Protein Loop Searching
• Protein Folding And Model Generation on page 319
• Secondary Structure Prediction
• Protein Loop Searching
• Random Tweak Loop Generation
• Protein Loop Analysis
• Small Peptide Methodology
• Nucleic Acid Modeling on page 328
• Single Strand Nucleic Acids
• Nucleic Acid Double Helices
• Polysaccharide Modeling on page 330
18.1 Introduction
SYBYL provides a flexible environment for the display and manipulation of
large and small biomolecules. A ready-to-use capability for modeling of
polypeptides, polynucleotides (RNA and DNA) and polysaccharides is built-in.
SYBYL allows small molecule and biopolymer modeling. Thus, processes like
substrate or inhibitor binding to an enzyme or hormone-receptor interactions
can be studied.
Much of our knowledge of the structure of proteins and nucleic acids comes
from X-ray diffraction studies. The repository of this information has always
been the Protein Data Bank [Ref. 1]. SYBYL reads and writes files in standard
PDB format. Sequence data for proteins and polynucleotides are maintained by
a number of groups. SYBYL reads and writes the PIR format of the National
Biomedical Research Foundation [Ref. 2]. Examples of both these file formats
may be found in the TA_DEMO directory. Comprehensive reviews of
biopolymer structure can be found in the books by Schulz and Schirmer for
proteins [Ref. 3], Saenger for nucleic acids [Ref. 4], and Aspinall for polysac-
charides [Ref. 5].
Biomolecular systems tend to be complex and large in size. To study and under-
stand many aspects of these structures, it is helpful to have tools capable of
highlighting selected features of a biopolymer’s three-dimensional represen-
tation. The utility of color computer graphics in this regard has been widely
acknowledged [Ref. 6]. Visual enhancement by the use of ribbon displays and
by the capacity to formulate general and flexible coloring schemes contributes
significantly to the goal of understanding biopolymer structure.
Many of the methods mentioned above may fail when confronted with large
insertions or deletions; these often occur in loop regions of proteins. Energy-
based methods are usually capable of making a single educated guess of what a
loop conformation may be like. However, the available data is often consistent
with a variety of conformations for these loop regions. Thus, there arise the
issues of (1) sampling the conformational space available for the loops and (2)
choosing from the produced sample one (or a few) that may be considered best.
For those purposes a variety of CPU intensive techniques have been proposed
[Ref. 18-Ref. 20]. These techniques (especially those described in Ref. 19 and
Ref. 20) put a lot of trust in the potential energy function or force field used.
However, they usually provide a good idea of the extent of geometric variability
expected in loop conformations, and for small enough loops, may actually
produce a systematic and exhaustive enumeration of attainable backbone
conformations.
Needleman-Wunsch
For sequences of length n, there are on the order of 22n possible alignments. To
search all these possibilities for the best alignment would be computationally
prohibitive. Needleman and Wunsch [Ref. 28] provided the classic solution to
this problem by developing a dynamic programming algorithm that aligns two
sequences of length n and m in order nm time. The original Needleman-Wunsch
method used a penalty for each gap. Our implementation of Needleman-Wunsch
is that of Fredman [Ref. 29] in which the gap penalty is independent of the size
of the gap. The beauty of the Needleman-Wunsch method is that it is guaranteed
to find an optimal alignment for the given homology matrix and gap penalty.
There may be several optimal alignments (alignments with the highest score). In
this case SYBYL will report only one of these.
Homology Matrices
The protein homology matrices use the following standard single letter amino
acid codes:
SET RESIDUES
1 DEKR
2 GAV
3 AVLI
4 VLIM
5 FYW
6 ST
7 QN
8 GP
You can create your own homology matrix file with the format specified on
page 293. SYBYL will search for this file first in your current directory and
then, if it is not there, in the directory specified by Tailor variable BIOPOLYMER
SIMILARITY_MATRIX.
Gap Penalty
The quality of the alignment is heavily dependent on the gap penalty. The
higher the gap penalty, the greater the resistance to insertion of new gaps into
the alignment. It is important to select a gap penalty appropriate for the
particular homology matrix in use. The best penalty is typically the average of
all the values in the current homology matrix and must be a positive integer.
Alignment Evaluation
Doolittle [Ref. 26] defines some rules of thumb to determine if two sequences
are similar enough to be considered related. If they are longer than 100 residues
in length and are greater than 25% identical (with appropriate gaps) then they
are very likely related. If they are 15 to 25% identical, then they may still be
related and jumbling (see below) should be performed to determine the statis-
tical significance of the alignment. If they are less than 15% identical, they are
probably not related.
Identity Score
The identity score (% identity) reported by BIOPOLYMER ALIGN_SEQUENCES
is the number of identical residues in the two sequences divided by the length of
the shortest sequence (without gaps).
For example:
Biopolymer > Compare Sequences > Align and Write MSA and
BIOPOLYMER MULT_ALIGN_SEQ compute the identity score by dividing the
number of identical residues by the length of the first sequence in the list.
Therefore, the first sequence always has an identity score of 100%.
Alignment Score
The alignment score is a measure of the similarity of the aligned sequences. The
higher the score per given homology matrix and gap penalty, the better the
alignment.
The sequences are aligned using the method of Needleman & Wunsch [Ref. 28]
as implemented by Fredman [Ref. 29]. Gaps may be inserted into either
sequence to find an optimal alignment, based on the current length-independent
gap penalty1 and the homology matrix2. For each aligned pair, the percentage of
residue positions having the same amino acid in both sequences is calculated.
Some proteins may be unusually rich in certain amino acids, and this can lead to
their appearing to be more similar than they really are. As a result, two such
proteins will exhibit a spurious homology (or false positive). In order to better
discriminate homologies, we apply a jumbling strategy to correct for such spuri-
ousness (see for example Ref. 25). This is implemented as follows. After
sequence alignment, the two sequences being compared are repeatedly
randomized a given number of times3, until several jumbled sequences of each
are available. Then, each of the jumbled sequences of one is subjected to the
alignment procedure with each of the jumbled versions of the other. For
example, if each sequence is jumbled 5 times, then altogether 25 alignments of
jumbled pairs are made. Their alignment scores are averaged, the mean obtained
(S), and the standard deviation (D) calculated. The score of the original
alignment (S0) is compared with the mean of the randomized sets (thus
separating signal from noise), and the difference is expressed:
S – S0
X = -------------- [EQ 2]
D
where
• S = mean of alignment scores
• S0 = score of the original alignment
• D = the standard deviation of the jumbled scores
• X = the significance score, a second filter (after the identity cutoff)
which is applied to filter false positives.
Backbone Construction
This functionality enables you to generate plausible models for the backbone of
a protein or a polypeptide given only the coordinates for the α carbons [Ref. 35,
Ref. 36]. This command is useful in studying proteins for which only α carbon
coordinates have been deposited with the Protein Data Bank.
The method for backbone construction uses a 3-pass screen for finding each
fragment. First it measures the end-to-end distance of all fragments in the
database, saving those fragments whose distance is within a specified tolerance
from the reference fragment. Next, the retained fragments are screened by
comparing all inter-Cα distances within the fragment with the corresponding
distances in the reference fragment, and saving the M best fragments. Finally, it
performs a least-squares fit of each retained fragment onto the reference
fragment, and chooses the one with the lowest RMS deviation. If the RMS
deviation is below a threshold value and the fragment length is less than a
specified maximum, the fragment length is incremented by 1 and the procedure
is repeated.
To construct an entire chain, the procedure starts at the beginning of the chain
looking for a fragment of a given minimum length (e.g. 4 residues, the default
value). The length N of the actual fragment found may be anywhere between
this minimum length (4 in this example) and a specified maximum length. The
procedure in principle could then move down the chain by N residues and look
for the next fragment in the database. However, to avoid discontinuities at the
junction of two fragments, the method actually allows successive fragments to
overlap instead of advancing a full N residues down the chain. The number of
overlapping residues is determined by Tailor variable CONSTRUCT_BACKBONE
TRIM_C, which dictates the number of residues to trim off the end of one
fragment, and TRIM_N, which specifies how many residues to trim off the end
of the next fragment. Note that the sum of TRIM_C and TRIM_N must be less
than the minimum sequence length; otherwise the entire fragment could be
trimmed, and the procedure would never be able to advance down the
polypeptide chain.
Sidechain Addition
SYBYL proteins are built from residue files which contain atoms appropriate
for chain continuation in either direction. For this reason the N terminal
nitrogen atom is given an amide atom type suitable for amide functional groups
in the interior of proteins. Real proteins, however, have either blocking groups
or charged N terminal residues.
For proteins:
• The N terminus is capped by AMN (charged), AMI (neutral) or one of
the following blocking groups (see the Force Field Manual for partial
charges on N-terminal groups).
• ACE: N-acetyl
• PYR: N-pyroglutamyl
• FOR: N-formyl
• NMT: N-methyl
• BOC: N-t-butyloxycarbonyl
• The C terminus is capped by CXL (charged), CXC (neutral) or one of
the following blocking groups (see the Force Field Manual for partial
charges on C-terminal groups).
• NME: N-methyl amide
• AMD: amide
• NMM: N,N-dimethyl amide
• CME: methyl
• MES: methyl ester
• EES: ethyl ester
The best way to build a protein whose structure is not known is to base it on a
homologous protein whose structure is known; i.e. by homology modeling (see
the FUGUE Manual and the ORCHESTRAR Manual). However, if there is no
known homolog, one method to determine the structure is by predicting the
regions of regular secondary structure and then adjusting the intervening loop
regions. One should be aware that the reliability of this procedure is much lower
than that of homology modeling, thus it should be used with extreme caution
and only as a last resource [Ref. 37-Ref. 41]. SYBYL provides a way to predict
the secondary structure of a protein. This command makes it possible to read a
file containing only the primary amino acid sequence (see page 298 for the
format of this file). The sequence can also be read from a molecule area. The
command then lists the primary sequence and the predicted conformation for
each residue (α-helix, β-sheet, coil).
SYBYL provides three methods for assigning conformation. All three methods
work by first studying a database of proteins of known structure, from which a
set of parameters is derived. These parameters are then used in a formalism
(series of equations) that enables approximation of probability or probability-
like estimates of the tendencies of given amino acid sequences to attain
particular secondary structures. These methods differ from each other in the
techniques used to extract the information present in the database.
Additional Information:
• Ref. 41 for a review of secondary structure prediction methods
• Secondary Structure Prediction Files on page 298 for the format of the
input and output files
• Predict Secondary Structure on page 183 for a description of the
command syntax
The loop search facility of SYBYL (Protein Loop Search on page 193) enables
the use of fragments of proteins of known three-dimensional structure during
building of models of unknown protein structures. This functionality looks for
fragments of specified geometry in a protein fragment database constructed
from the Protein Data Bank. The specified geometry is given by distances and
coordinates involving the end residues of the loop or fragment.
In any loop search the size of the fragment to model depends in part on the
secondary structure of the surrounding regions; in general it is wise to leave
elements of regular secondary structure as unperturbed as possible.
For example, consider the target protein ABCDEFLMNOPQ. There may be any
number of additional residues between F and L, but as soon as EF and L are
chosen as the anchor region, those additional residues disappear from the final
protein model. When inserting the pentamer GHIJK between F and L, using EF
and L as the anchor region, the database is searched for fragments of the form
EFGHIJKL. Residues ABCD and MNOPQ represent the framework region, EF
and L the anchor region of the target protein, EF and L the anchor region of the
loop, and GHIJK the window region.
The approach used by SYBYL’s loop search is to find fragments in the database
of the proper residue length whose anchor regions have a good geometric fit to
the anchor regions of the modeled protein. Application of this procedure usually
generates several candidate loops that satisfy the geometrical requirements; i.e.
protein fragments of the specified length that close the gap in the polypeptide
chain while preserving nearly ideal covalent geometry. However, there are other
criteria, not explicitly used during the actual loop search, that can guide the
choice of one particular candidate loop over another. In order to invoke these
criteria, a LOOP ANALYSIS facility is available. This facility provides graphical
tools for analyzing and selecting from the retrieved fragments on the basis of
quality of fit to the anchor regions, sequence homology, steric interactions, and
other criteria.
To efficiently find the fragments with the best matching anchor regions,
SYBYL uses a 3-pass screen. First, it measures the end-to-end distance of all
fragments in the database of the proper number of residues and retains all whose
distance is within a given threshold of the corresponding distance in the target
protein. Second, it computes all distances between an α carbon in the N anchor
region and an α carbon in the C anchor region for each retained fragment. The
root mean square deviation of these inter-Cα distances is computed from the
corresponding distances in the reference protein, and those fragments whose
RMS distance deviation is less than a second threshold are retained. Third, it
performs a rigid body least-squares fit of the anchor regions in each retained
database fragment to the anchor regions of the target protein, and retains those
fragments whose least-squares fit RMS is below a third threshold. The values of
the various thresholds and other parameters of the search (see below), are
controlled by Tailor variable PROTEIN_LOOP.
When a retrieved fragment is inserted into the target molecule, the coordinates
of the fragment’s atoms are transformed according to the fragments least-
squares fit to the anchor regions. While this gives unambiguous coordinates for
the window region of the loop, a question remains: How to assign the coordi-
nates in the anchor regions of the loop to avoid bad discontinuities. SYBYL
offers two choices for adjusting the coordinates, MELD_ANCHOR and
TWEAK_LOOP, controlled by Tailor variable PROTEIN_LOOP ADJUST_COORDS.
The TWEAK_LOOP option leaves the anchor residues of the reference protein
unperturbed, and makes small adjustments to the torsion angles of the window
residues to achieve an exact overlap of the anchor coordinates. This option uses
the TWEAK algorithm discussed in the next section.
where:
• xyz_frag are coordinates from the database fragment
• frag_wt equals LOW_LOOP_WEIGHT + (fraction * weight_diff)
• xyz_target are coordinates from the original target protein
• target_wt equals 1 - frag_wt
• fraction equals res_dist / (na + 1)
• weight_diff equals HIGH_LOOP_WEIGHT - LOW_LOOP_WEIGHT
• res_dist is how far removed a given anchor residue is from r1 (must be
between 1 and na)
• r1 is the residue preceding the anchor region of the loop
• na is the number of anchor residues in this anchor region
Tweak initially defines four distance constraints and their target values between
the Cα and N atoms of the N terminal anchor residue and the Cα and C atoms
of the C terminal anchor residue (see Figure 1).
Tweak uses a random number generated successively from a user definable seed
value contained in the SPL variable MNDL_TWEAK_SEED. A protein fragment of
the required number of user specified residues is constructed with random φ/ψ
angles taken from a uniform distribution (proline φ angles are not modified by
the command).
The set of distances for the anchors in the generated loop is measured and a
difference vector between the actual and target distance constraints is computed.
A matrix [D] containing the derivatives of each distance with respect to each
torsion angle is computed. A set of optimal corrections to the torsion angles is
calculated from a 4x4 linear system defined by the difference vector and the
derivative matrix:
∆d j
∆Θ i = [ D ] --------------------
T
- [EQ 4]
[ D ][ D]
The loops generated by Tweak are now subjected to a series of tests to check
their suitability for inclusion in the protein model.
If Tailor variable TWEAK DO_BUMP_CHECK is set to YES a van der Waals bump
check is performed on the backbone atoms of the loop (see Tailor variable
GENERAL BUMPS_CONTACT_DISTANCE and BUMPS_NEIGHBOR_DISTANCE to
control the bump checking algorithm). This is an internal check for van der
Waals contacts between atoms within the loop generated by TWEAK and does not
include, for example, van der Waals checks between the loop and the original
model structure. Loops failing the optional bump check are rejected.
If a loop fragment has passed all screening tests it is accepted, fitted to the
original anchor region, and written to a .loop file for subsequent analysis (see
Loop Search Results in a Spreadsheet on page 201).
This entire method is repeated until the number of loop fragments generated
equals that specified by Tailor variable TWEAK NLOOPS. The resulting loop
fragments are passed into BIOPOOLYMER LOOP ANALYZE and you are
prompted for commands to analyze and select from the set of tweaked loops.
When the loop search is complete, SYBYL writes a file containing the param-
eters of the loop search and the loop fragments. For each loop fragment, the
following information is stored:
• the source of the fragment,
• amino acid sequence,
• RMS fit to anchor regions,
• coordinates of the backbone atoms in the loop, that are transformed to fit
the reference molecule.
Note that the source of the fragment is always TWEAK_XX (where XX is the
loop number, the amino acid sequence for all tweak generated loops is constant,
and the RMS fit to the anchor regions will always be very close to zero. Note
also that only coordinates of backbone atoms are produced. You can later add
sidechains (see Add Sidechains on page 125).
Tailor subject TWEAK affects the behavior of biopolymer tweak loop searches:
• MAX_ITERATIONS—The maximum number of iterations performed on a
set of initially random torsion angles. Loops that have not met the
distance constraints within this number of iterations are rejected and the
letter d is written to the terminal.
• MAX_TORSIONAL_CHANGE—The maximum change (in °) allowed per
torsion angle per iteration. A torsion only changes by ± this value per
iteration.
• TARGET_DISTANCE_TOLERANCE—The difference in length considered
significant between actual and target distance vectors. Actual and target
vectors whose length differ by less than this value are considered equal
in length.
• NLOOPS—The number of loops to generate per run
• DO_BUMP_CHECK—Whether loops with bad internal van der Waals
contacts are accepted in the final set of generated loops. See Tailor
variable GENERAL BUMPS_CONTACT_DISTANCE and
BUMPS_NEIGHBOR_DISTANCE to control the bump checking algorithm.
The Protein Loop Analysis functionality provides you with the ability to choose
protein fragments for inclusion in a particular model structure based on a
number of criteria. In general no single measure of candidate loop suitability
will ever suffice for an automated selection of loops in a model structure. For
this reason you are offered a number of tools for display and analysis of the
candidate loops using SYBYL’s graphics capabilities.
The candidate loops found by any of the methods discussed above are written
into a .loop file. The Protein Loop Analysis functionality reads this file and
enters the candidate loops as rows in a spreadsheet. The column entries which
are automatically created for this table include source of the fragment, sequence
of the fragment, and RMS deviation of the least-squares fit to the anchor region.
You may introduce additional columns which (1) measure distances, angles, or
torsions within each of the loops, (2) check bumps within loops or between the
loop and the surrounding protein, (3) calculate RMS deviations from a reference
loop or (4) score the loops according to a variety of mutation criteria. Inter-
esting loops can be saved to a SYBYL database for further treatment using the
energy-based techniques discussed above.
Although these modeling building tools go a long way towards the goal of more
realistic peptide modeling, the need for more quantitative results may require
the use of energy minimization or MD calculations on these systems. Thus,
force field parameters for the newly created residues are needed. This, in
general, represents no special problem if you are using the Tripos force field.
However, if you wish to use the Kollman potential energy function [Ref. 12,
Ref. 13] additional parameter determinations have to be carried out. To assist
with this, we have included a few non-protein amino acids in the macromol
dictionary, as well as some blocking groups whose electrostatic parameters have
been calculated by following the original work as closely as possible [Ref. 12,
Ref. 13, Ref. 46]. A detailed description of these calculations, which includes a
prescription of how such parameters could be derived by using tools available
within or from SYBYL, is included in the Force Field Manual.
Thus, in order to get correct Z-DNA for a (dC-dG) oligomer, one should
enter the sequence as c=g=c=g= etc.; entering g=c=g=c results in reverse
conformations for guanine and cytosine, as observed experimentally for Z-
DNA [Ref. 45]. This feature, although demanding some extra care on your
part in simple situations, permits generation of Z-DNA double helices for
more exotic sequences (the structures can be energy-refined later). It is
possible to study distorted models of this kind of helix (see [Ref. 52] for an
application along these lines).
2. Build an RNA Double Helix on page 152
Generates double helical models [Ref. 53] of A and A’ (denoted AP in
SYBYL) of RNA molecules of arbitrary sequence.
You can build glycoproteins and proteoglycans by creating the protein and the
polysaccharide pieces in separate molecule areas then adding a bond to connect
the appropriate atoms in the two structures. This allows minimization of the
combined structures with the Tripos force field. Other BIOPOLYMER commands
work only on the section of the molecule corresponding to the open dictionary.
If you have interest in using the Kollman force fields, contact Tripos for instruc-
tions on creating combined dictionaries.
[45] (a) K.R. Shoemaker, P.S. Kim, D.N. Brems, S. Marqusee, E.J. York,
I.M. Chaiken, J.M. Stewart and R.L. Baldwin, Proc. Natl. Acad. Sci.
USA 1985, 82, 2349.
(b) K.R. Shoemaker, P.S. Kim, E.J. York, J.M. Stewart and R.L.
Baldwin, Nature 1987, 326, 563.
(c) M. Rico, J. Santoro, F.J. Bermejo, J. Herranz, J.L. Nieto, E. Gallego
and M.A. Jimenez, Biopolymers 1986, 25, 1031.
(d) M. Vasquez and H.A. Scheraga, Biopolymers 1988, 27, 41.
[46] U. C. Singh and P. A. Kollman, J. Comp. Chem. 1984, 5, 129.
[47] R. E. Dickerson, J. Biomol. Struct. Dynam. 1987, 5, 557.
[48] S. Arnott and D. W. L. Hukins, Biochem Biophys. Res. Comm. 1972, 47,
1504-1509.
[49] S. Arnott and E. Selsing, J. Mol. Biol. 1975, 98, 265-269.
[50] R. Chandrasekaran, M. Wang, R. G. He, L. C. Puigjaner, M. A. Byler, R.
P. Millane and S. Arnott, J. Biomol. Struct. Dyn. 1989, 6, 1189-1202.
[51] A. H. J. Wang, G. J. Quigley, F. J. Kolpak, G. van der Marel, J. H. van
Boom, and A. Rich, Science 1981, 211, 171-176.
[52] S. Arnott, D. W. L. Hukins and S. D. Dover, Biochem Biophys. Res.
Comm. 1972, 48, 1392-1399.
[53] B. Hartmann, B. Malfoy and R. Lavery, J. Mol. Biol. 1989, 207, 433-
444.
[54] Wang J., Cieplak P., Kollman A., “How Well Does a Restrained
Electrostatic Potential (RESP) Model Perform in Calculating
Conformational Energies of Organic and Biological Molecules?”
J. Comp. Chem. 2000, 21, 1049-1074.
[55] Cieplak P., Caldwell. J., Kollman P., “Molecular Mechanical Models for
Organic and Biological Systems Going Beyond the Atom Centered Two
Body Additive Approximation: Aqueous Solution Free Energies of
Methanol and N-Methyl Acetamide, Nucleic Acid Base and Amide
Hydrogen Bonding and Chloroform/Water Partition Coefficient of the
Nucleic Acid Bases.” J. Comp. Chem., 2001, 22, 1048-1057.
V
van der Waals
contact column type 205