Biopolymer Manual

Biopolymer Manual
SYBYL®-X 2.1
Mid 2013
1699 South Hanley Rd. Phone: +1.314.647.1099

St. Louis, MO Fax: +1.314.647.9241
63144-2917 http://www.certara.com
LEGAL NOTICE
SYBYL and related Tripos modules © 1991-2013 Certara, L.P. All Rights Reserved.
Benchware and related Tripos modules © 2005-2013 Certara, L.P. All Rights Reserved.
Almond © 2003-2013 Molecular Discovery Ltd. All Rights Reserved.
AMPAC © 1997-2013 Semichem. All Rights Reserved.
AMM-2001 module in AMPAC version 8.16.5 © 2001 Regents of the University of Minnesota. All Rights Reserved.
Concord, Confort, CombiLibMaker, DiverseSolutions, ProtoPlex and StereoPlex © 1987-2001 University of Texas at
Austin. All Rights Reserved.
FlexX © 1993-2011 BioSolveIT. All Rights Reserved.
FUGUE, JOY, HOMSTRAD, ORCHESTRAR © 2012 Cambridge University Technical Services, Cambridge,
England. All Rights Reserved.
RACHEL © 2002-2012 Drug Design Methodologies.
Surflex, Surflex-Dock, and Surflex-Sim © 1998-2012 BioPharmics LLC. All Rights Reserved.
VolSurf and Almond © 2001-2012 Molecular Discovery Ltd. All Rights Reserved.
Portions copyright 1992-2012 FairCom Corporation. All Rights Reserved.
This material contains confidential and proprietary information of Certara, L.P. and third parties furnished under the
Tripos Software License Agreement. This material may be copied only as necessary for a Licensee’s internal use
consistent with the Agreement. The allowed use includes printing of hardcopy versions hereof as minimally necessary
for Licensee’s internal use. Neither Certara, L.P., nor any person acting on its behalf, makes any warranty or
representation, expressed or implied, with respect to the accuracy, completeness, or usefulness of the material
contained in this manual or in the corresponding electronic documentation, nor in the programs or data described
herein. Certara, L.P. assumes no responsibility nor liability with respect to the use of this manual, any materials
contained herein, or programs described herein, or for any damages resulting from the use of any of the above. Except
for printing of hardcopy versions as stated, no part of this manual may be reproduced in any form or by any means
without permission in writing from Tripos (DE), Inc., 1699 South Hanley Road, Suite 200, St. Louis, Missouri 63144-
2917, USA (314-647-1099).
Selected software programs for methodologies contained or documented herein are covered by one or more of the
following patents: AllChem: US 7,860,657; Comparative Molecular Field Analysis (CoMFA): US 5,025,388; US
5,307,287; US 5,751,605; AT E150883; BE 0592421; CH 0592421; DE 691 25 300 T2; FR 0592421; GB 0592421;
IT 0592421; NL 0592421; SE 0592421. HQSAR: US 6,208,942. Embedded NLM: US 6,675,103. Topomers: US
6,185,506; US 6,240,374; US 7,184,893; US 7,212,951. TopCoMFA: US 7,329,222. DBTop: US 7,330,793. OptiSim:
US 6,535,819. Surflex software programs for chemical analysis by morphological similarity: US 6,470,305 B1.
SYBYL, UNITY, CoMFA, CombiFlexX, Concord, DiverseSolutions, GALAHAD, LeapFrog, OptDesign, StereoPlex,
and Alchemy are registered trademarks of Certara, L.P.
AUSPYX, Benchware, CScore, DISCOtech, Distill, GASP, HQSAR, Legion, MOLCAD, Molecular Spreadsheet,
Muse, OptiDock, OptiSim, Pantheon, ProTable, ProtoPlex, Selector, SiteID, Topomer CoMFA, Topomer Search,
Tuplets, and Tripos Bookshelf are trademarks of Certara, L.P.
RACHEL is a trademark of Drug Design Methodologies.
Surflex, Surflex-Dock, and Surflex-Sim are trademarks of BioPharmics LLC.
“FairCom” and “c-tree Plus” are trademarks of FairCom Corporation and are registered in the United States and other
countries.
All other trademarks are the sole property of their respective owners.
Biopolymer Table of Contents
1. Introduction to Biopolymer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.1 What is New with Biopolymer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2 License Requirements for Biopolymer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2. Biopolymer Tutorials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1 Protein Preparation Tutorial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Peptide Building Tutorial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.3 Protein Loop Search Tutorial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.4 Monomer Definition Tutorial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3. Biopolymer Menu Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

3.1 Main Biopolymer Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.2 Biopolymer Functions on Other SYBYL Menus . . . . . . . . . . . . . . . . . . . . . 54
4. Read and Write Biopolymer Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

4.1 Read and Write PDB Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.2 Read and Write PIR and FASTA Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5. Biopolymer Display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.1 Define and Apply Protein View Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.2 Simple Biopolymer Displays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.3 Color Schemes for Biopolymers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.4 Biopolymer Ribbons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
5.5 Label Biopolymer Atoms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.6 Ramachandran Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
6. Prepare Biopolymer Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

6.1 Protein Preparation Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
6.2 Add Hydrogens . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
6.3 Set the Protonation Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
6.4 Load Charges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
6.5 Edit Termini . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
6.6 Fix End Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
6.7 Fix SYBYL Atom Types in Cofactor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
6.8 Fix SYBYL Atom Types in Ligand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
6.9 Assign AMBER Atom Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
6.10 Add Sidechains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
6.11 Fix Sidechain Amides . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
6.12 Fix Prolines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
6.13 Chain Termini Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
6.14 Set Chain Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
6.15 Renumber a Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
SYBYL-X 2.1 Biopolymer 3

6.16 Convert PDB Atom Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
6.17 Check Biopolymer Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
6.18 Convert an External Mol2 File to a Biopolymer . . . . . . . . . . . . . . . . . . . . 137
6.19 Convert a Small Molecule to a Biopolymer . . . . . . . . . . . . . . . . . . . . . . . 138
6.20 Minimize Biopolymer Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
7. Build Biopolymer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .141

7.1 Build Protein, DNA Strand, RNA Strand, Carbohydrate . . . . . . . . . . . . . . 142
7.2 Disulfide Bridges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
7.3 Build C-alpha to Backbone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
7.4 Build a DNA Double Helix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
7.5 Build an RNA Double Helix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
7.6 Add Solvent or Cofactor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
7.7 Break a Chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
7.8 Join Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
7.9 Form a Cyclic Peptide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
7.10 Add Phosphate Caps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
7.11 Build a Random Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
8. Biopolymer Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .161

8.1 Protein Composition Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
8.2 Replace Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
8.3 Mutate Monomers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
8.4 Insert Monomers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
8.5 Excise Monomers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
8.6 Delete Monomers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
9. Biopolymer Conformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .173

9.1 Measure Conformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
9.2 Set Backbone Conformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
9.3 Find Secondary Structure Conformation . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
9.4 Assign Secondary Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
9.5 Predict Secondary Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
9.6 Set Sidechain Conformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
9.7 Scan Sidechain Torsions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
9.8 Copy Conformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
10. Protein Loop Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .193

10.1 Introduction to Loop Searches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
10.2 Search PRODAT Database for Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
10.3 Tweak Conformational Loop Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
10.4 Loop Search Results in a Spreadsheet . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
4 Biopolymer SYBYL-X 2.1

10.5 BIOPOLYMER LOOP ANALYZE Command . . . . . . . . . . . . . . . . . . . . 203
11. Compare Biopolymer Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207

11.1 Align Sequences and Write MSA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
11.2 View/Edit Alignments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
11.3 List Biopolymer Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
12. Compare Biopolymer Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215

12.1 Fit Monomers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
12.2 Align Structures by Homology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
12.3 RMS Fits of Conformers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
12.4 Find and Fit Fixed Regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
13. Search Protein Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227

13.1 Sequence Expression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
13.2 Inter-CA Distance Expression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
13.3 Class Expression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
13.4 Protein Search Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
14. Biopolymer Dictionary & Database Administration . . . . . . . . . . . . . . . 235

14.1 Biopolymer Dictionary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
14.2 Manage Custom Dictionary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
14.3 Create or Modify a Monomer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
14.4 Create AMBER SLN Typing Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
14.5 Define a New Blocking Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
14.6 Create/Update the PRODAT Database . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
15. The Sequence Viewer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261

15.1 Description of the Sequence Viewer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
15.2 Mouse and Keyboard Interactions in the Sequence Viewer . . . . . . . . . . . 275
16. Biopolymer Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277

16.1 Biopolymer Dictionaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
16.2 Protein Homology Matrix Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
16.3 Biopolymer Loop Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
16.4 Secondary Structure Prediction Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298
16.5 Common Biopolymer File Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301
17. Biopolymer Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303

17.1 Associated Tailor Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
18. Biopolymer Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307

18.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308

18.2 Protein Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
18.3 Nucleic Acid Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328
18.4 Polysaccharide Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330
18.5 Biopolymer Recommended Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332

1. Introduction to Biopolymer
SYBYL/Biopolymer provides a flexible environment for the display and manip-

ulation of large and small biomolecules. A ready-to-use capability for modeling
of polypeptides, polynucleotides (RNA and DNA) and polysaccharides is built-
in. SYBYL/Biopolymer is integrated with the rest of SYBYL to enable smooth
combination of small molecule and biopolymer modeling. Thus, processes like
substrate or inhibitor binding to an enzyme or hormone-receptor interactions
can be studied.
Biopolymer modeling uses the concept of a residue. In addition to treating

macromolecular structures atom by atom (the approach used in SYBYL/Basic),
SYBYL/Biopolymer builds and modifies structures on a residue by residue
basis. Each class of biopolymer has one or more dictionaries associated with it
which delineate the available residues as well as define the types of connections
and conformational states possible. The composition and characteristics of each
available residue are maintained in residue files. You can create your own
residues within a biopolymer class or add additional classes of biopolymers.
Much of our knowledge of the structure of proteins and nucleic acids comes
from X-ray diffraction studies. The repository of this information has always
been the Protein Data Bank [Ref. 1]. SYBYL/Biopolymer includes the
capability of reading and writing the standard PDB format. Sequence data for
proteins and polynucleotides are maintained by a number of groups. SYBYL/
Biopolymer currently reads and writes the PIR format of the National
Biomedical Research Foundation [Ref. 2]. Examples of both these file formats
may be found in the TA_DEMO directory. Comprehensive reviews of
biopolymer structure can be found in the books by Schulz and Schirmer for
proteins [Ref. 3], Saenger for nucleic acids [Ref. 4], and Aspinall for polysac-
charides [Ref. 5].
Biomolecular systems tend to be complex and large in size. To study and under-
stand many aspects of these structures, it is helpful to have tools capable of
highlighting selected features of a biopolymer’s three-dimensional represen-
tation. The utility of color computer graphics in this regard has been widely
acknowledged [Ref. 6]. Visual enhancement by the use of ribbon displays and
by the capacity to formulate general and flexible coloring schemes contributes
significantly to the goal of understanding biopolymer structure.
For the modeler trying to probe structure-function relationships in biopolymers

it becomes imperative to calculate at least semi-quantitative energies for intra-
and intermolecular interactions. In these situations conformational energy and
molecular mechanics [Ref. 7-Ref. 10] calculations, through the use of geometry
optimization or molecular dynamics (MD) techniques, represent additional
modeling tools.

1. Introduction to Biopolymer
What is New with Biopolymer
1.1 What is New with Biopolymer

Interrupt a Minimization
Clicking the icon during an energy minimization terminates the process
after the current iteration and retains the new coordinates.
Protein View Enhanced

Protein View now supports ligands that are in a separate molecule area from the
protein complex.
Adding Sidechains
An issue that was causing SYBYL to become non-responsive after sidechains
were added has been resolved.
1.2 License Requirements for Biopolymer

SYBYL-X Suite Licensing
SYBYL-X introduced a simplified licensing scheme in which the “SYBYL”

license provides access to all biopolymer functionality.
Module-Based Licensing
SYBYL continues to run with a license file issued before the SYBYL-X release.
In that context:
• A “BioPolymer” license provides access to the biopolymer functionality.
This license is also required to assign and label AMBER and Kollman
atom types and to use any of the AMBER and Kollman force fields, and
to perform a staged minimization.
• A “MOLCAD” license is required to make full use of the Protein View
dialog.
Additional features accessible only from the Biopolymer menu require

Advanced Protein Modeling licenses:
• FUGUE: “Fugue”
• ORCHESTRAR: “ORCHESTRAR” and “ORCHESTRAR_Interface”

2. Biopolymer Tutorials
Explore SYBYL’s biopolymer functionality:

• Protein Preparation Tutorial on page 10
• Peptide Building Tutorial on page 20
• Protein Loop Search Tutorial on page 32
• Monomer Definition Tutorial on page 41
• Localized Minimization Tutorial (in the Force Field Manual)
See License Requirements for Biopolymer on page 8.

Protein Preparation Tutorial
2.1 Protein Preparation Tutorial

The purpose of this tutorial is to introduce the user to the Prepare Structure
Tool. This feature was designed to provide the user with a single location to
prepare a protein for further manipulation in SYBYL.
This tool allows the user to rename unrecognized atoms, repair incomplete
sidechains and the backbone, add or modify termini, add hydrogens, assign
KOLLMAN/AMBER types and charges, find the best hydrogen bonding
arrangement for sidechains containing amide groups, and fix sidechain van der
Waals overlap with the option of using rotamer libraries to set a sidechain
conformation.
The structure used in this tutorial is an oxido-reductase called dihydrofolate

reductase in complex with NADPH and methotrexate taken from a published
crystal structure: Bolin, J. T., Filman, D. J., Matthews, D. A., Hamlin, R. C.,
Kraut, J.: Crystal structures of Escherichia coli and Lactobacillus casei dihydro-
folate reductase refined at 1.7 Å resolution. I. General features and binding of
methotrexate. J. Biol. Chem. 257 p. 13650 (1982).
A Matter of Time: This tutorial requires about 15 minutes of personal time.
2.1.1 Retrieve the Structure

1. It is always a good idea to clear the screen and reset the display before starting.
! > Delete Everything
! Click to reset all rotations and translations.
2. Retrieve the structure of interest, 3dfr, from the RCSB.

! File > Retrieve PDB
The Retrieve PDB dialog opens (dialog description on page 64).
! Set the dialog as follows:

- PDB code: 3dfr
- Retrieve From: RCSB Server
- Load Retrieved PDB File: on
! Click Retrieve to read in the protein.
Note: If you do not have Internet access, you can retrieve the file from
$TA_DEMO. File > Import File, select [$TA_DEMO], then 3dfr.pdb and
press OK.

3. Review the information provided in the console by the PDB reader.
When you retrieve a PDB file and read it for first time into SYBYL it is worth
looking at the messages in the console.
! Scroll back in the console (or increase the window’s size) and look at
the lines starting with “Adding...”
These lines refer to creation of individual substructure sets based on information

in the PDB records. For a full list of sets created by SYBYL’s PDB reader see
Substructure and Atom Sets on page 60.
! Look at the “NOTE” lines. These provide important information.
These lines report the following:

• All water atoms are stored in the {WATER} set.
• The {HETATM} set contains all atoms from HETATM records in the
PDB files. In general these atoms belong to modified amino acids,
ligands, cofactors, and metals. You will highlight these atoms in the
context of 3dfr in the next step.
• The presence of a {LIGDB} set indicates that the file includes a ligand
that matches one in the ligand database. Atoms in this set are also in the
{HETATM} set.
• The mention of “1 modified residue” refers to the fact that some of the
HETATM records match those of backbone atoms in one of the residues
in the dictionary, but the name of the residue in the file does not match
the name of the residue in the dictionary. Indeed, MTX (the ligand)
contains atoms whose names match those of the GLU residue.
• The absence of an {UNK_ATOMS} set indicates that all records were
processed and that all atom types and bond types were assigned success-
fully.
4. If you have a personal default view defined, reset the view of the protein so that
all atoms are shown and colored by atom type, and all bonds displayed as lines
! View > Protein View > Reset View
5. Label the ligand and cofactor.
! Click in the toolbar.
! In the Atom Expression dialog, expand the Other Substructures list,

then expand the Chain A list.
! Check the check boxes for NDP163 and MTX164 and click OK.

! Click and select Atoms > Substructure.
2.1.2 Analyze and Prepare the Protein

6. Access the Prepare Structure Tool dialog.
! Biopolymer > Prepare Structure > Structure Preparation Tool
The Prepare Protein Structure dialog opens (dialog description on page 94).
7. Analyze the protein’s structure.
! Press Analyze Selected Structure.
The analysis takes a few seconds to complete. When it is done the fields in the
dialog are populated as shown below.
All the residues are found to have the correct atom names and the correct
number of backbone atoms when compared to the residue files in the macromol
dictionary.

8. Repair the sidechain.
The analysis found that one residue is missing sidechain atoms.
! On the Repair Sidechain line press Show to highlight this residue.
The residue is labeled and highlighted: LYS51.
! Press Fix to add the missing atoms.
The sidechain atoms are retrieved from the lysine residue file in the dictionary.
The Set Sidechain Conformation dialog appears (dialog description on

page 185). In the dialog, the current sidechain torsion angles are reported.
! Set the Rotamer Source to use the Lovell rotamer library.
! Click the first set of chi values in the list and press Set Selected.
The sidechain in LYS51 adopts the conformation with the highest probability in
the Lovell rotamer library.
! Note the Delta Energy reported below the list.
The value represents, for the residue selected at the top of the dialog, the
difference between the energy of the applied rotamer and the energy of the
Initial conformer. Energy values are computed using the Tripos force field.
! Click the Set Next buttons to scroll through the Lovell rotamers for
this residue type.
If minor steric clashes occur with other residues or water molecules, they are
indicated by yellow dashed lines. Likewise, major steric clashes are indicated
by red dashed lines.
Note: To reset the LYS51 sidechain to its original conformation, change the
Rotamer Source to Initial.
! Select the highest probability conformer in the list and press Set
Selected.
! Press Close to exit and continue with the protein preparation.
The Prepare Protein Structure dialog reappears with the Repair Sidechain line
greyed out. A partial analysis of the protein is conducted for residues with
missing hydrogens, invalid atom types, and sidechain bumps.

9. Treatment of chain termini.
The analysis found that two chain termini need to be fixed or modified.
! On the Termini Treatment line press Show.
Two residues are highlighted (THR1 and ALA162).
! Press Fix.
The Edit Termini dialog appears (dialog description on page 108) showing the
two terminal residues highlighted.
! Change one of the New Block menus to Charged. Note that the
other changes automatically.
! Press Apply to Selected Protein.
! Press Close to exit and continue with the protein preparation.
The Prepare Protein Structure dialog reappears with the Termini Treatment line
missing hydrogens, invalid atom types, and sidechain bumps.
10. Add the hydrogens.
The analysis reports that 426 residues are without hydrogens. One of these
residues is the cofactor, NADPH, missing its hydrogen. Also included in this
number are the co-crystallized waters.
! On the Add Hydrogens line press Add.
The Add Hydrogens dialog appears (dialog description on page 99) with the
molecule already selected in the list.
! Make sure that All hydrogens will be added and that hydrogens will
be added to the water molecules in a Random orientation.
! Press OK to add the hydrogens.
The Prepare Protein Structure dialog reappears with the Add Hydrogens line
invalid atom types, and sidechain bumps.
11. Set the protonation type of a residue to favor hydrogen bonds with the ligand.
! On the Set Protonation Type line press Fix.

The Set Protonation Type dialog appears (dialog description on page 101) with
the molecule already selected in the list.
! Below the list of residues, activate List Only Residues Near

Ligands.
The list contains only the residues that are within 6 Å of any ligand or cofactor
atom and that may have more than one protonation state at near neutral pH.
! Filter the list by setting Manipulate to Acids (GLU/ASP).
The list contains four ASP residues, and the first one is selected.
! Activate Auto Center to show in the middle of the screen only the
residues in the region of interest.
! Select ASP26.
The carboxylate oxygens are close enough to the ligand to form hydrogen
bonds.
! By default, GLU and ASP residues are deprotonated. Click Proto-

nated (GLZ/ASZ).
Flipping the orientation of the COOH group would be more favorable to

hydrogen bonding.
! Click Flip 180 Degrees.
The list reflects your selection: ASP26 is in the ASZ state and the orientation of
the COOH group has been flipped.
! Close the Set Protonation Type dialog.
12. Assign AMBER7-FF99 atom types to the ligand.
The analysis reports that 53 atoms do not have the proper AMBER/Kollman
atom types.
! On the Type Atoms line press Show.
All the atoms in methotrexate are highlighted. They have the correct SYBYL
atom types, but this structure does not match any of the monomers defined in
the dictionary. Therefore, its AMBER and Kollman atom types are unknown.
The missing atom types can be assigned via an SLN atom typer based on a
fragment library.
! Press Fix next to Type Atoms.

The Assign AMBER Atom Types dialog appears (dialog description on page 117)
with the molecule already selected in the list.
! Set Atom Types to AMBER7 FF99.
! Press Assign Atom Types.
4 lone pairs removed (Incompatibility with AMBER7_FF99)

Loading AMBER7_FF99 atom types onto 3486 atoms in molecule M1:
3433 AMBER7_FF99 atom types assigned from dictionary
27 AMBER7_FF99 atom types assigned by SLN Typer (residues)
49 AMBER7_FF99 atom types assigned by SLN Typer (fragments)
Atom types are assigned first from the dictionary. This was done for the 3433
atoms in the protein and cofactor. The SLN typer reassigned atom types to 27
atoms in the termini and to 49 of the ligand atoms.
Atom type summary for complete molecule M1:
3406 AMBER7_FF99 default dictionary atom types.
76 AMBER7_FF99 user defined/SLN typed atom types.
4 AMBER7_FF99 unknown atom types.
This left 3406 atom types assigned from the dictionary and a total of 76
assigned by the SLN typer. These were automatically marked as user-defined.
The remaining 4 atoms could not be typed by either method.
4 AMBER7_FF99 atom types are unknown and added to set {UNK_AMBER7_FF99}
Converting dictionary atom types into user defined types.
3406 AMBER7_FF99 dictionary atom types converted into user defined
The four atoms that could not be typed were stored in the set called
{UNK_AMBER7_FF99}. All the atom types that had been assigned from the
dictionary were marked as user-defined (for advantages of this operation see
Marking Atom Types as User-Defined on page 123).
Atom type summary for complete molecule M1:
0 AMBER7_FF99 default dictionary atom types.
3482 AMBER7_FF99 user defined/SLN typed atom types.
4 AMBER7_FF99 unknown atom types.
4 AMBER7_FF99 atom types are unknown and added to set {UNK_AMBER7_FF99}
The Assign AMBER Atom Types dialog reports that four atoms could not be
typed.
! Click the icon so you can hide the protein part of the molecule.
! In the General Structure Display dialog, click Residues, press

(Select All), then press Undisplay below the list.
Because the cofactor matches a template in the dictionary it was undisplayed

along with the standard protein residues.
! In the Assign AMBER Atom Types dialog, toggle on the Expert

Options.
! Toggle on Highlight Missing Atom Types.

! Set their Label to Missing With SYBYL Types.
The four atoms are aromatic nitrogens. Their AMBER7 FF99 atom types must
now be assigned manually. Conveniently, an atom set called
UNK_AMBER7_FF99 contains these four atoms.
! Press Manual.
! Select ASSIGN and press OK.
! Select SPECIFIC and press OK.
! In the Atom Expression dialog press Sets.
Note all the “UNK” sets for the AMBER7_FF99, AMBER7 FF02,
AMBER95_ALL, and KOLL_ALL force fields.
! Select UNK_AMBER7_FF99 and press Add.
! Press OK to close the Atom Expression dialog.
! Specify the 4 new atom types to be NC and press OK for each.
! Press End to terminate the manual assignment of atom types.
Redisplay the protein and cofactor.
! Click the icon, click Residues, press , then Display.
The Assign AMBER Atom Types dialog now reports that all atoms have been
assigned AMBER7 FF99 atom types.
! Close the Assign AMBER Atom Types dialog and continue with the
protein preparation.
The Prepare Protein Structure dialog reappears, and the Type Atoms line still
reports that 53 atoms do not have the proper types. This is because the other sets
of AMBER and Kollman atom types have not yet been loaded on the ligand. If
you want to use other force fields than AMBER7 FF99, you will need to repeat
this operation for the appropriate sets of atom types. See Force Fields for
Biopolymer in the Force Field Manual for lists of atom types.
13. Add atomic charges
! On the Add Charges line press Add.
The Load Charges dialog appears (dialog description on page 104) with the
molecule already selected in the list.

! Set the Biopolymer pull-down to AMBER7 FF99.
! Set the Ligands pull-down to Gasteiger-Huckel.
! Make sure that the Water check box is on to assign AMBER7 FF99
charges to the water molecules.
! Press OK to assign the atom charges.
The molecular description retrieved from RCSB includes temperature factors

(B-factors). SYBYL now warns you that these will be overwritten if you
proceed.
! Press Yes to replace B-factors with charge values.
The Ligand Charges dialog appears, listing only methotrexate. Because the
cofactor matches a template in the dictionary, it was treated along with the
standard protein residues.
! Click A/MTX164 and click OK.
The Prepare Protein Structure dialog reappears.
14. Orient the sidechain amides in all ASN and GLN residues to maximize
hydrogen bonding.
! On the Fix Sidechain Amides line press Fix.
A list of residues that have been reoriented appears in the console.
The Prepare Protein Structure dialog reappears after checking for sidechain
bumps.
15. Steric interactions involving sidechain atoms.
The analysis still reports that the sidechains are not involved in steric clashes.
However, in your own work you may find that some of the operations in this
dialog result in a few sidechain bumps. These can be taken care of easily via the
Fix button. The Set Sidechain Conformation dialog will then appear (dialog
description on page 185) with the appropriate residues already loaded. You can
then resolve the bad steric interactions quickly with the Scan Selected
Residue option.
16. Minimize the protein-ligand complex in progressive stages, using the AMBER7
FF99 force field.
! On the Staged Minimization line press Perform.

The Staged Minimization dialog is displayed (dialog description in the Force

Field Manual).
! At the bottom of the dialog press Set Minimization Details.
The Minimize dialog is displayed (dialog description in the Force Field Manual).
! Press Modify to change the Energy Setup.
The Energy dialog is displayed (dialog description in the Force Field Manual).
! Set the Force Field pull-down to AMBER7 FF99.
! Set the Charges to Use Current.
! Press OK to close the Energy dialog.
! Press OK to close the Minimize dialog.
! Back in the Staged Minimization dialog, below the list of steps, change
Reset Steps To 10 and press Apply.
! Toggle off step 2. Minimize Waters.
! Press OK to begin the calculation.
Watch the minimization proceed in stages. While the minimization is

proceeding the atoms are color coded according to the local strain energy (sum
of energy terms in which each atom is involved). When the molecule is once
again colored by atom types the minimization is complete.
! When the minimization completes press Close to exit the Prepare

Protein Structure dialog.
17. Save the molecule. Use the Mol2 format as this will store the atomic charges.
! File > Export File ( )
! Make sure that the Format is set to Mol2.
! Enter 3dfr as the file name and press Save.
This concludes the tutorial.

Peptide Building Tutorial
2.2 Peptide Building Tutorial

This tutorial illustrates the design of the biopolymer dictionary and some of the
operations available for building, manipulating and editing biopolymers. You
will build a 25 residue segment of the DNA binding domain of the “zinc finger”
peptide (Tyr-Lys-Cys-Gly-Leu-Cys-Glu-Arg-Ser-Phe-Val-Glu-Lys-Ser-Ala-
Leu-Ser-Arg-His-Gln-Arg-Val-His-Lys-Asn), see figure below, using the
macromol biopolymer dictionary.
The zinc finger motif is an independently folded domain with a compact

structure in which the zinc atom is bound by two cysteine and two histidine
residues. The polypeptide backbone fold consists of a well-defined helix packed
against two β-strands that are arranged in a hairpin structure. The high density
of basic and polar amino acid sidechains on the exposed face of the helix are
probably involved in nucleic acid binding.
After completing this tutorial, you will be able to:

• Open a protein dictionary and build a peptide sequence.
• Optimize the structure by relieving close contacts among the sidechain
atoms.
• Identify the different conformations within the secondary structure of a
peptide.
• Modify an existing peptide sequence.

2.2.1 Build the Peptide

2. To build the peptide sequence, bring up the Build Biopolymer dialog and use it
to define the sequence.
! Biopolymer > Build > Build Protein
The macromol dictionary is opened automatically and the Build Protein dialog
is displayed (dialog description on page 142).
! Click TYR LYS CYS GLY LEU CYS GLU ARG SER PHE.
! Set the Conformation to beta_sheet.
! Set the N-terminus to add the blocking group ACE.
! Set the C-terminus to None. You will add more residues to the
sequence later.
By default, the dialog is set up to add all the hydrogens to the sequence being
built. For this first biopolymer tutorial you will switch off that option.
! Toggle the Add Hydrogens check box off.
! Press Build.

3. Label the Cα with the residue types and sequence numbers. These are called
substructure labels.
! Use the icon to set Atm Lbl to Substructure.
4. NMR data indicate that there is a hairpin turn at residues 5-6. Modify the
structure to reflect this conformation.
! Biopolymer > Conformation > Set Backbone Conformation
! In the Substructure Expression dialog click on LEU5 and CYS6 then

press OK.
! In the Set Backbone Conformation dialog, activate Conformational
State.
! Select turnI in the adjacent menu.
! Press Set.
5. Add a few more residues to the peptide, connecting the new sequence to the
C-terminal.
! Select M1(A/PHE10.C) as the attachment point and press OK.
! Click VAL GLU LYS SER ALA LEU SER ARG HIS GLN.
! Set the Conformation option to None.
! Set the C-terminus to add the blocking group NME.
! Toggle Add Hydrogens off.
! Press Build.
6. Change part of the sequence’s conformation to an alpha helix.

! Biopolymer > Conformation > Set Backbone Conformation
The Substructure Expression dialog shows the molecular description as a

hierarchy.
! Click and drag from GLU12 through GLN20 in the hierarchy.
The nine residues from GLU12 through GLN20 are selected in the dialog and
highlighted on the screen.
! Press OK.

! Activate Conformational State.
! Select alpha_helix in the adjacent menu.
! Press Set.
7. Notice that the molecule has been built off center and rotates in an awkward
manner. Center the display of the molecule on the screen.
! Click away from the molecule to clear the selection.
! Click to center the view of the entire molecule.
8. The molecule was given the generic name of builder_protein. Give it a more
specific name.
! Right-click on any atom and select Rename Molecule.
! Type znf for the molecule name and press OK.
2.2.2 Scan the Torsion Angles

9. Note that there are a number of close contacts among the sidechains and
between the sidechain and backbone atoms of the molecule, especially in the
alpha-helical region. To relieve these steric interactions, scan the torsion angles
to find positions that are more energetically favorable. For purposes of illus-
tration, assume that it is only necessary to scan the sidechain bonds in the alpha-
helical region of the peptide.
! Biopolymer > Conformation > Scan Sidechain Torsions
10. Restrict the scan to only those bonds involved in the alpha-helical regions.
! Press Sets.
! Select alpha_helix in the Conformations list and press Add.
The residues in the alpha-helical region of the molecule are highlighted.
! Activate Show Substructure Expression.
In the field at the bottom of the dialog the expression

M1({FINDCONF(alpha_helix,*)}) makes use of the built-in set
{FINDCONF} that will identify all the residues in which the backbone is in an
alpha-helical conformation.

Even though some of the atoms highlighted are enclosed in a ring (HIS19), the
SCAN operation recognizes them and simply eliminates the corresponding
bonds from the computation.
! Press OK and watch the molecule as the scan is performed.
When the scan is finished, the sidechains in the alpha-helical region are in a
reasonable conformation with no close contacts. The scan is an iterative
process. A message in the console reports that no bad contacts were found at the
end of the torsion scan:
Iteration 1 finished, fixed 28 bonds this iteration, 0 to go.
! Click anywhere away from the molecule to clear the selection.
2.2.3 Analyze the Secondary Structure

11. Analyze the secondary structure of the peptide to find the conformational states
defined in the dictionary. This locates the conformations that were initially set
during the building phase of the tutorial.
! Biopolymer > Conformation > Find Secondary Structure
The Find Secondary Structure dialog is displayed (dialog description on

page 177)
! Set the Method to use the Dictionary.
! Next to the list of States to Find press the icon (Select All).
The information found by this operation can be stored for reuse by creating
substructure sets for the sequences that match defined conformational states.
! Toggle Create Sets from Results on.
! Press Find.
The secondary structures along with their associated sequences are listed in the
console.
Finding conformations for molecule in m1 using Dictionary method...
Creating sets...
ALPHA_HELIX_A_DICT
BETA_SHEET_A_DICT
TURNI_A_DICT
Additional lines of text identify the secondary structure elements in the

sequence. The relevant lines for znf are:
Chain A:
1 10 20
.YKCGLCERSFVEKSALSRHQ. :sequence
------------HHHHHHHH-- :alpha_helix

-EEEE--EEEE----------- :beta_sheet
-----11--------------- :turnI
12. Visualize the secondary structure through shaded rendering.
! Toggle Render Conformations on.
! Press Find.
The rendered image reflects the protein’s secondary structure: alpha helical
regions are rendered as magenta ribbons, beta strands as yellow curved arrows,
and the remainder of the protein as a cyan curved tube.
! Close the dialog.
Remove the background images.
! > Backgrounds (Surface, Ribbon, etc.)
! Select ALL and press OK.
13. Another way to visualize the secondary structure of a peptide is to display only
the backbone and color it to highlight the regions of different secondary struc-
tures.
! View > Protein View > Backbone Only
Highlight the secondary structure regions of interest.
! View > Color by Scheme > Secondary Structure
Atoms in the alpha helix are colored red and those in beta sheet conformation
are colored blue. If you had not created sets in the Find Secondary Structure
dialog, the backbone color would be entirely white.

14. Another color scheme of interest is that of acidic, basic and polar residues.
Instead of the menubar you can use the color icon.
! Use to color the molecule By Acid/Base.
The colors highlight the following types of residues:

• Blue—basic
• Red—acidic
• White—uncharged polar
• Green—non-polar
15. The basic and polar residues of the helix are proposed to be involved in nucleic
acid binding. The Atom Expression dialog provides a powerful selection mecha-
nisms that can make use of built-in sets and conformational states to select
exactly those residues.
First, reset the color by atom types for better contrast.
! Set to By Atom Type.
The first part of the selection retrieves all atoms in the Basic and Polar global
sets.
! Click .
! In the Atom Expression dialog press Sets below the hierarchy.
! Select the BASIC and POLAR sets and press Add.
You will now combine the initial selection with residues in alpha-helical
conformation.
! In the Atom Expression dialog press Substructures.
! In the Substructure Expression dialog press Sets.
! Select alpha_helix from the Conformations list and press Add.
! In the Substructure Expression dialog press Intersect.
In the field at the bottom of the Atom Expression dialog the expression reads
M1(({BASIC}+{POLAR})&({FINDCONF(alpha_helix,*)})).
! Press OK.
! Set to Magenta.

! Click anywhere away from the molecule to clear the selection.
The alpha helical residues that are either basic or polar are colored magenta.
16. Explore other color schemes.
! Try the various coloring options of the icon or View > Color By
Scheme menu.
See Color Schemes for Biopolymers on page 83 for details.
17. Before you go on, reset the color scheme to atom types and redisplay the
sidechains.
! Set to By Atom Type.
! View > Protein View > Whole Molecule (or > All Atoms).
18. Add all the hydrogens.
! Click .
2.2.4 Modify the Peptide Sequence

19. The options within the Biopolymer menu make it easy to insert, delete or
replace residue sequences, as well as adjust their conformation. First, replace a
specific residue, PHE10 with ASN. Note the difference between replacing a
specific residue and replacing all occurrences of a general residue type (see
below). In either case, the new sequence must have the same length as the old
sequence.
! Biopolymer > Composition > Mutate Monomers
! Click any atom in the PHE10 residue.
! Press OK in the Sequence Expression dialog.
! Select ASN as the new residue and press OK.
20. Before you continue, save this sequence as a .mol2 file, to be recalled later.
! File > Export File ( )
! Set the Format option to MOL2.
! Type znf10 in the file field.
! Press Save.

21. Replace all occurrences of SER with CYS.
! Biopolymer > Composition > Mutate Monomers
! Press Types.
! Select SER from the residue types list and press Add.
The three serine residues are highlighted.
! Press OK in the Sequence Expression dialog.
! Click CYS and press OK.
When the modifications are complete, notice that the backbone conformation is
unchanged. In addition, the mutate operation preserves the sidechain conforma-
tions to the extent possible. Each residue is analyzed and the values of the
conformational angles which are defined in the dictionary are recorded for the
current residue. These angles are applied to the new residue as best as can be
done to preserve the conformational similarity between sequences.
22. The Excise Monomer option is used to remove a residue from the peptide
chain and join its neighbors to close the gap. As before, the conformational state
of the removed residue is preserved in the neighbor which replaces it. However,
there are times when it is not desirable to rejoin the residues to close the gap,
for example, when such action would result in the destruction of favorable inter-
actions elsewhere in the peptide. The Delete Monomers option is provided for
this situation. It performs the same action but does not reform the chain.
! Biopolymer > Composition > Excise Monomer
! Click any of atoms in LYS13 and press OK in the Sequence Expression

dialog.
Note that GLU12 and CYS14 have been joined by a long bond. The geometry
of the rest of the peptide has been preserved.
! Biopolymer > Composition > Delete Monomers
! Click any of atoms in VAL11 and press OK in the Sequence Expression

dialog.
Note the gap between ASN10 and GLU12. That is the difference between the
Excise and Delete operations.
23. Recall the .mol2 file that you saved earlier and add the remaining five residues
of the alpha-helical region.
! Click on the SYBYL toolbar.

! Set Files of Type to Molecule.
! Select znf10.mol2 in the Selection list.
! Select M1:znf as the molecule area to overwrite its content and press
OK.
! Use the icon to set Atm Lbl to Substructure.
24. Extend the alpha helix by inserting a sequence of residues.
! Biopolymer > Composition > Insert Monomers
! Type HIS19 to insert after this monomer and press OK. You could,
instead, select any atom in HIS19.
! Click GLN ARG VAL HIS LYS.
! Set the conformation to alpha_helix.
! Activate Adjust Geometry so that the GLN20 and the end of the
chain move to make way for the five residues you are inserting.
If you do not check Adjust Geometry, the existing structure of the peptide
will be maintained, resulting in bad local geometry.
! Press Insert.
25. Renumber the entire sequence to give consecutive numbers to the substructures.
! Biopolymer > Prepare Structure > Renumber Sequence
! Press (Select All) then OK.
! Type 1 as the number for the first monomer and press OK.
Note that the ACE blocking group at the beginning of the chain kept the
sequence number 0.
2.2.5 Add Hydrogens and Display Hydrogen Bonds

Information about hydrogens is stored in the dictionary’s residue files. You will
look at various types of hydrogens as well as hydrogen bonds.
26. Access the General Structure Display functionality.
! Click .

27. A separate dialog regroups various options to view hydrogens.
! Press Hydrogens at the bottom of the dialog.
The General Structure Display - Hydrogen dialog appears (dialog description in

the SYBYL Reference Guide).
28. All the hydrogens were included while you built the peptide. You can now
toggle some or all of them off.
! In the Hydrogen Atoms section, press Polar.
Only the polar atoms remain visible. These are flagged in the dictionary’s
residue files as the essential hydrogens because they can be involved in
hydrogen bonds.
! Press HBonding.
Only hydrogens currently involved in hydrogen bonds are visible. All are in the
helical section of the peptide. These are some, but not all, of the polar
hydrogens.
29. Display the hydrogen bonds.
! In the Hydrogen Bonds section, press Visibility: Display.
Observe the characteristic hydrogen bonding patterns of the alpha-helical and

turn regions of the peptide. You can display the hydrogen bonds in a different
color and with a thicker line.
! Change the Color to Magenta.
! Change the Linewidth to 3.
30. Display a ribbon that follows the backbone.
! Triple-click on any atom to select the entire molecule.
! Right-click on any selected atom the select Quick Ribbons >

Cartoon.
! Click anywhere to clear the atom selection.
See how the hydrogen bonds stabilize the secondary structure.

31. To change the ribbon’s appearance, right-click it and specify the color and style
of your choice.
32. This concludes the tutorial.

! When you are done, close all dialogs.

Protein Loop Search Tutorial
2.3 Protein Loop Search Tutorial

The Protein Loop Search tutorial is designed to demonstrate techniques for
replacing a short fragment (loop) in a protein with novel fragments retrieved
from a database.
Knowledge-based modeling of proteins involves using known structure(s) to

design novel structures. Loop Search in particular uses fragments from proteins
stored in a database in a “spare parts” approach to protein modeling. That is,
fragments or loops in a given protein of interest are cut out and replacements
that have desired properties (length, sequence, fit, etc.) are retrieved and melded
in place. Loops that meet the criteria are saved to a loop file which can be
loaded directly into a spreadsheet for further analysis and selection.
Loop searching in SYBYL is performed on the PRODAT database.
2.3.1 Background on Loop Searches

Loop Searches can be used to aid in modeling minor changes to protein
sequences. Any modification which goes beyond simple site-directed
mutagenesis is a candidate for a Loop Search, especially if insertions or
deletions in the sequence are involved. The assumption is that only local
changes to the structure result, and these changes can be modeled accurately by
choosing appropriate loop fragments from known structures.
Each loop search is defined in terms of three important regions: a window

region, an N-terminal anchor and a C-terminal anchor. Anchor regions are
residues on each side of the loop which are a part of the original template
structure. These residues are not removed from the template, but serve to define
distance constraints on the ends of the loops retrieved from the database of
structures. The window region is the loop itself. Thus the loop search looks for
fragments whose length equals the window region plus the length of each of the
anchors. Inter-Ca distances between the anchor residues in the retrieved loop
must match (within the allowed variance) the distances found in the template
structure.
The exercise you will perform consists of modeling a portion of the three-
dimensional structure of the insect-directed scorpion neurotoxin of A. australis
based on its sequence homology to the variant 3 neurotoxin of C.sculpturatus.
The sequences of these two proteins are quite similar, and you will use the
sequence alignment published by J.C. Fontecilla-Camps (J. Mol. Evol., 63-67
(1989)) for this exercise.

The sequence of the target or unknown protein is given first; the sequence of the
template or known protein is given second.
1 5 10
Model: LYS LYS ASN GLY TYR ALA VAL ASP --- SER SER GLY LYS ALA PRO
sn3: LYS GLU GLY TYR LEU VAL LYS LYS SER ASP GLY CYS LYS TYR
1 5 10
15 20 25
Model: GLU CYS LEU LEU --- --- --- SER ASN TYR CYS ASN ASN GLN CYS
sn3: GLY CYS LEU LYS LEU GLY GLU ASN GLU GLY CYS ASP THR GLU CYS
15 20 25
30 35 40
Model: THR LYS VAL --- HIS TYR ALA ASP LYS GLY TYR CYS CYS LEU LEU
sn3: LYS ALA LYS ASN GLN GLY GLY SER TYR GLY TYR CYS TYR ALA PHE
30 35 40
45 50 55
Model: SER CYS TYR CYS PHE GLY LEU ASN ASP ASP LYS LYS VAL LEU GLU
sn3: ALA CYS TRP CYS GLU GLY LEU PRO GLU SER THR PRO THR TYR PRO
45 50 55
60 65 70
Model: ILE SER --- ASP THR ARG LYS SER TYR CYS ASP THR THR ILE ILE ASN
sn3: LEU PRO ASN LYS SER CYS
60 65
Using the sequence alignment given above, you will carry out one deletion and
a few of the single residue mutations. You will the use the Biopolymer Loop
Search functionality to look for peptide fragments with the desired number of
residues and the required end-to-end geometry to preserve a continuous
polypeptide chain.
2.3.2 Prepare the Template Structure.

2. Read the coordinates of Scorpion Toxin Variant-3 (PDB code 2SN3).
! Set Files of Type to Molecule.
! Select [$TA_DEMO] in the Bookmarks list then double-click

2sn3.pdb in the Selection list.

3. If you have a personal default view defined reset the view of the protein so that
all atoms are shown and colored by atom type, and all bonds displayed as lines.
! View > Protein View > Reset View
The molecule is colored by atom types. The eight CYS residues are involved in
disulfide bridges.
4. Delete all non-peptide atoms: all the waters and a molecule of 2-methyl-2,4-
pentanediol used in crystallizing the protein. These are stored in the molecular
description as individual substructures.
! Click .
! In the Atom Expression dialog click on the lines for Other Substruc-
tures and Waters.
All atoms in MDP66 and all water oxygens are highlighted. To these add all
hydrogens.
! Press Types, select H and press Add.
! Press OK to terminate the selection.
! Click to delete all selected atoms.
2.3.3 Perform the Deletion and Mutations

You will mutate a few residues of the template protein into those of the target to
prepare for the loop search by excising the appropriate residues from the model
(without any geometry relaxation), adding the extension to the chain’s
N-terminus, performing point mutations, and re-numbering the amino acids to
attain the amino acid sequence and numbering of the unknown model.
1. First perform one of the deletions in the sequence.
! Biopolymer > Composition > Excise Monomers
! In the hierarchy select LYS8 and press OK.
Note that a long bond fills the gap left by the deleted residue. You will replace
this gap with loop candidates later in the exercise.
2. The target protein is longer than the template by one residue on the N-terminus.
! Select M1(A/LYS1.N) as the attachment point and press OK.

! Press LYS and set the Conformation option to beta_sheet.
! Toggle off the Add Hydrogens check box.
! Press Build.
A blocking group is automatically added to the N-terminus.
3. Renumber the protein’s sequence.
! Biopolymer > Prepare Structure > Renumber Sequence
! Press (Select All) and press OK.
! Type 1 for the first monomer’s sequence number and press OK.
! Click away from the molecule to clear the selection.
4. Label all the residues.
! Use to label the Molecule by Substructure.
5. Perform one of the site mutations to convert the sequence of 2sn3 to the model
protein.
! Biopolymer > Composition > Protein Composition Tool
The Edit Protein Composition tool is displayed (dialog description on page 162).
Here is how to perform a single point mutation:
! Set the Action menu to Mutate.
! On Current Sequence line press [...].
! Select residue GLU3 in the dialog and press OK.
! On the New Sequence line press [...].
! Click on residue ASN then press OK.
! Use the defaults in the Sidechain Conformation Details section:

- The Initial Position for each residue is set to use the Lovell
rotamer library.
- An iterative torsional Scan of the new sidechain will be
performed if necessary, using a scan angle of 30° (360°/12)
and 90% of the van der Waals radii.

! At the bottom of the dialog press Apply to Selected Sequence.
The console reports on what happened.

• The rotamer resulting in the fewest bumps was selected from the Lovell
library.
• A torsional scan of the sidechain bonds in ASN3 relieved the remaining
contacts in a single iteration. The scan functionality uses a minimalist
approach and moves sidechains no further than is necessary. Thus, in
most cases, the resulting positions are quite close to the starting confor-
mations.
6. Make the other mutations in this region of the protein. You can perform all
these within the same dialog.
! The Edit Protein Composition tool is still open. Use it to make the
following mutations (remember to press Apply to Selected
Sequence after each operation):
LEU6 :ALA
LYS8 :ASP
ASP10 :SER
More mutations would need be performed to complete the full model sequence.
However, the tutorial is limited to finding loop replacements in this region only.
! Close the dialog.
2.3.4 Search for Loop Replacements

7. Speed up the loop search by limiting the number of results.
! Options > Tailor
! In the Subject pull-down of the Tailor dialog, select

PROTEIN_LOOP.
! In the MAX_NLOOPS field, type 20.
! Press Apply at the bottom of the Tailor dialog.
! Press Close.
8. Perform a loop search to find suitable replacements for a region of the protein
where one of the deletions occurred. By default the loop search uses two
N-terminal anchor residues and one C-terminal anchor residue (the defaults) for

this search. From the working alignment, the deletion being modeled in this step
is:
5 6 7 8 9 10 11
Model: ... TYR ALA VAL ASP --- SER SER GLY ...
2sn3 : ... TYR LEU VAL LYS LYS SER ASP GLY ...
4 5 6 7 8 9 10 11
! Biopolymer > Protein Loops > Search PRODAT Database
! Type 7 for the residue preceding the window region and press OK.
! Type 10 for the residue following the window region and press OK.
Specify the residue types to be used at positions 8 and 9 in the model.
! Select ASP SER in the Sequence dialog and press OK.
! Type aspser.loop for the run name and press OK.
Ignore the messages in the console about “Tweak did not converge.” These are
indications that the process is still on-going. When the menubar is active you
are ready to proceed.
2.3.5 Analyze the Loop Results

9. When the SYBYL menubar is active again retrieve the results.
! Biopolymer > Protein Loops > Analyze Search Results
! Select aspser.loop from the file list and press OK.
The results are reported in a spreadsheet named ASPSER. The loop fragment
that gave the best geometric fit was automatically melded into the model.
10. Use the spreadsheet’s Biopolymer menu to color the melded loop fragment in
the protein.
! MDE: Biopolymer > Color Loop > Green (or your favorite color)
11. Examine a few of the other loops.
! Select the first 10 rows in the spreadsheet.
! MDE: Biopolymer > Examine Selected Loops
! In the Examine Selected Rows dialog press Next to meld the next
loop into the protein.
! Examine the other loops.

12. Return to loop number 1 on the basis of good fit and high homology. Notice that
some of the other loops may be just as good and would have to be considered in
a serious modeling effort as possible alternatives.
! In the dialog press Previous to return to the first loop or Jump to
Row 1.
! Close the dialog.
2.3.6 Explore the Spreadsheet of Loop Results

13. The spreadsheet is the central tool to analyze the suitability of the candidate
loops. The information about the loop fragments consists of:
• Name—The row names report the source of the retrieved loops (PDB
code and starting residue).
• ID—The ranking of the loops found by the search. Loops are ranked
from best (lowest) RMS fit to highest.
• Sequence—The actual sequence in each loop retrieved from PRODAT.
The length of this sequence is affected by Tailor variables that determine
the number of residues preceding and following the loop window. (By
default, NANCHOR_N and NANCHOR_C are set to 2 and 1, respectively).
• Homology—The score comparing the actual sequence to the target or
model sequence.
• Fit_RMS—A measure of the RMS fit of the retrieved loop to the anchor
residues.
The best candidate loops are those with (1) the smallest number of van der
Waals contacts or bumps, (2) the lowest value of RMS fit, and (3) the highest
value of homology score. Two of these pieces of information are automatically
built into the spreadsheet, and you will add the other.
14. Add a column containing the number of van der Waals contacts.
! Right-click on the header of the empty column and select Add a

Computed Column.
! Double-click VDW_CONTACTS for the column type.
15. Graphs offer another visual presentation of the results. Display a scatter plot of
Fit RMS deviation of the anchor regions vs. the loop ID and color the points by
the number of vdW contacts.
! Click on the MDE toolbar.
! In the Create Scatter Plot dialog:

- X Axis: Id.
- Y Axis: Fit_RMS.
- Color Axis: VDW_CONTACTS.
! Press OK.
16. Graph the mutation-rate-based homology score vs. the loop ID and color the
points by the number of vdW contacts.
! Click on the MDE toolbar.

- X Axis: Id.
- Y Axis: Homology.
- Color Axis: VDW_CONTACTS.
! Press OK.
17. Selection in the scatter plots and spreadsheet are synchronized.
! Pick a point in either graph and note that the corresponding point is
highlighted in the other graph and that the row corresponding to that
point is selected in the spreadsheet.
! Pick other points.
Usage Note: To meld the loop corresponding to a selected point or row use the
method to used earlier in this tutorial: MDE: Biopolymer > Examine
Selected Loops. If only one row is selected it is automatically melded into to
the protein. If multiple rows are selected the Examine Selected Rows dialog lets
you examine them one by one.
18. Close the molecular data explorer.
! MDE: File > Close
! Select Yes if you want to save the spreadsheet and scatter plots to a
table file. The default file name is taken from the spreadsheet name.
19. Add sidechains to residues in the loop (including the anchor regions) and scan
torsional values to eliminate bad interatomic contacts. The residues is the loop
are stored in a set called LOOP. You can make use of this to select them.
You are now ready to repair the geometry of the protein in the regions where
residues were excised. This involves loop searches for reasonable protein
fragments to paste into these regions. The fragments retrieved from the protein
database only contain the backbone atoms. After each search add sidechains and
fix their local geometry as above.

! Biopolymer > Prepare Structure > Add Sidechains
The Add Sidechains dialog is displayed (dialog description on page 125).
! Press Select Residues.
! At the bottom of the dialog press Sets.
! Scroll down to the bottom of the Sets list and select LOOP then
press Add.
! Press OK to terminate the selection.
! In the Add Sidechains dialog activate Scan Sidechains then press

Add and Close.
! Clear the atom selection by clicking away from the molecule.
20. A preliminary model has been created, you may want to save it into a file before
exiting SYBYL.
! Right-click on any atom in the preliminary model and select Rename

Molecule.
! Type model for the new name.
! Click (or File > Export File).
! Type model in the file name field and press Save.
2.3.7 Compare the Model and Template Structures

21. Load the original 2sn3 molecule and compare it to the model.
! Select [$TA_DEMO] in the Bookmarks list then double-click

2sn3.pdb in the Selection list.
22. Show only the backbone for both molecules for easier comparison.
! Clear all selection.
! Use to show only the Backbone for both proteins.
The colored loop in the model indicates where the changes were made.
23. This concludes the Protein Loop Search tutorial.

Monomer Definition Tutorial
2.4 Monomer Definition Tutorial

In this tutorial, you will define a new monomer, N-methyl leucine, by
modifying L-leucine (LEU), which exists in the macromol dictionary and is
defined in the file $TA_DICT/leu.res. N-methyl leucine is used in the
construction of the peptide-mimetic drug, cyclosporin. This tutorial will thus
serve as an example of the construction of residues needed for constructing
peptide-mimetic drugs, as well as for the possible engineering of proteins using
non-standard amino acids.
At the end of this tutorial, you will be able to:

• Define a new monomer and add it to a dictionary.
• Specify the monomer name, three-letter and 1-letter acronyms, as well as
the name of the residue file (.res).
• Identify the atoms that specify its backbone, root atom and capping
atoms
• Automatically assign charges and atom types to be consistent with the
AMBER force fields, and edit these charges and/or types as necessary.
• Use your new monomer to build a peptide.
2.4.1 Know Your Source

It is easier to define a new monomer based on an existing monomer already
stored in a dictionary. Therefore, you must first ascertain that the next few
operations will find this dictionary.
In this tutorial, you will use information in the macromol dictionary as

distributed by Tripos to create your new monomer, then store it in a private
dictionary.
1. Determine the location of your SYBYL installation.
! Type: echo $TA_ROOT
2. Access the dictionary management functionality.
! Biopolymer > Dictionary & Database Admin > Manage Custom

Dictionary
The Custom Dictionary Management dialog appears.

3. Check the location of the default dictionary directory.
The Current Dictionary Directory at the top of the dialog shows the location
of the dictionary directory that your SYBYL session is currently pointing to. If
the information is the same as what echo $TA_ROOT reported above followed
by /biopolymer/tables/dictionary, you are using the dictionaries distributed
with SYBYL.1
4. Specify the current dictionary to be macromol.
! Verify that the Current Dictionary is macromol.
! If it is not, press Change Dictionary, select macromol and press

OK.
Your selection is echoed in the dialog: Current Dictionary is macromol.
2.4.2 Create and Use a Private Directory of Dictionaries

We strongly recommend that you make a private copy of the Biopolymer dictio-
naries before creating new monomers. Doing so has the following advantages:
• The distributed dictionaries are protected in their original format for
comparison purposes.
• Often SYBYL is installed in such a way that you can read but not write
to the distributed dictionaries.
• When you upgrade to a new version of SYBYL, and the newly
distributed dictionaries replace the old, you will not lose your modifica-
tions.
Here are the simple steps to follow.
5. Create a private copy of the distributed dictionaries.
! In the Custom Dictionary Management dialog press Create Custom

Dictionary Directory.
In a separate dialog, a default directory location is suggested within your home

directory.
! In the Custom Dictionary Location dialog press OK to accept the

creation of a directory named dictionary within your home directory.
1. If you already have a personalized dictionary directory, you know a lot about dictionaries
already. Before you proceed with this tutorial be sure to open a dictionary that contains a copy
of the leu.res file as distributed by Tripos.

! Press OK in the Success dialog reporting that all source files were
your copied and that your private dictionary directory is now used.
In the Custom Dictionary Management dialog the Current Dictionary Directory

points to your own copy.
6. Close the dialog so you can proceed and build the structure that will become
your new monomer.
! Close the Custom Dictionary Management dialog.
2.4.3 Build the Structure of the New Monomer

A template residue (leucine) will be used for building the residue.
7. Build the template for construction (based on leucine). Note that new monomers
must be built in the neutral, unblocked form.
! Click Leu.
! Leave the Add Hydrogens check box on.
! Set the N-terminus and C-terminus menus to None.
! Set Charge Model to use None.
! Press Build.
A Leucine residue is displayed on the screen, color coded by atom types.
8. Modify the existing template to create N-methyl leucine.
! Use to label the Atoms by Atom Name.
! Note visually the location of the hydrogen labeled H on the terminal

nitrogen because the SYBYL sketcher will replace these labels with
the atomic elements.
You will now replace the H-labeled hydrogen by a carbon, not the other
hydrogen, which is the cap atom.
! Click on any atom to indicate that you want the sketcher to operate
on this molecule.
! Click (or File > New > Small Molecule).

The three toolbars associated with the sketcher provide access to the sketching
tools, the essential atom types, and functional groups.
! Make sure that the [C] icon is already highlighted.
! Click (Modify Atom).
! Click the hydrogen on the terminal nitrogen whose location you noted
earlier.
The H label disappears, and the atom changes color. It is now a carbon.
9. Add the hydrogens to the methyl carbon and exit the Sketcher.
! Click (Add Hydrogens).
! Click EXIT.
This completes the creation of N-methyl leucine. Since leucine was used to
construct this new residue, it will be referred to below as the template.
10. Compute atomic charges.
When preparing a new monomer for addition to a dictionary you will need to
use a rigorous calculation method to compute the atomic charges. Consult the
information about Biopolymer Force Fields and Charges for Biopolymers (in
the Force Field Manual), Ref. 54 and Ref. 55. This, however, is not the focus of
this tutorial. For expediency, you will use the Del Re method because the values
it computes for amino acids are very close to those computed by AMBER.
! Compute > Charges > Del-Re
The Del Re method offers an opportunity to place a formal charge on the

molecule.
! Press No.
The charge values are displayed on the molecule as atom labels. When you are
done viewing the charges, change the labels to atom names.
! Use to label the Atoms by Atom Name.

2.4.4 Define the Monomer Features

11. Access the dialog for monomer definition.
! Biopolymer > Dictionary & Database Admin > Manage Custom

Dictionary
! In the Custom Dictionary Management dialog press Create New
Monomer.
12. Select the molecule area containing the structure of the new monomer.
! Select M1:builder_protein as the molecule area and press OK.
13. Specify the monomer type.
! Select PROTEIN in the list and press OK.
14. To facilitate the monomer definition process, indicate which monomer already
in the dictionary most closely resembles the one you are adding. The properties
of the existing monomer will then be used as default for the new one.
! Type or select LEU and press OK.
The Create Monomer dialog appears prompting for information necessary for
creating the residue file (dialog description on page 241).
15. Specify the filename and residue name.
! In the Basename for Monomer File field, type mle. This will produce
a file called mle.res in the current directory.
! In the Complete Monomer Name field, type N-methyl_leucine
! In the 3-Letter Mnemonic field, enter MLE. This will be the acronym
for the new monomer.
! In the 1-Letter Code field, enter a period (.). This indicates that the
one-letter code is ignored for this residue.
16. Label the atoms by name. This will help you verify some of the assignments
that were made automatically based on the template.
! With Label set to Atom Names, look at your new monomer.

Note that those atoms in MLE that are identical to those in the LEU template
have been named according to LEU. However, the additional atoms (those in
the N-methyl group) have not. SYBYL assigned unique names to these atoms:
C11, H1, H2 and H3. You can modify this if you want by clicking the Atom
Names button. Know, however, that all atoms must have different names.
Several definitions were made automatically based on the leucine template.

They are listed in the console:
Root atom: CA
Capping atoms: OXT HNCAP HOCAP
Backbone atoms: CA N C O C11 HA
Essential hydrogens: HNCAP HOCAP
Buttons in the dialog allow you to modify these assignments.
17. Check the root atom, that is, the atom that will bear the substructure label. The
alpha carbon is the logical candidate.
! Press Root Atom.
The atom labeled CA is highlighted.
! Click OK in the dialog to accept the default.
Your selection is echoed in the console (CA).
18. Check the capping atoms, that is, the atoms that will be removed when the
residue is connected to adjacent residues.
! Press Capping Atoms.
Three cap atoms are highlighted on the screen:

• The hydrogen connected to the nitrogen: HNCAP
• The hydroxyl group’s oxygen: OXT
• The hydroxyl’s group hydrogen: HOCAP
! Press OK in the Atom Expression dialog to accept these atoms as cap
atoms for the new residue.
19. Designate the backbone atoms.
! Press Backbone Atoms
The atoms currently corresponding to the backbone atoms in the template are
highlighted on the screen: N, CA, C, O, HA (hydrogen bonded to the Cα), and
C11 (the carbon of the N-methyl group).

! Hold the Ctrl key (Command on the Mac) and add to the atoms
already selected by clicking the three hydrogens in the N-methyl
group (H1, H2, H3).
! Press OK in the Atom Expression dialog.
All atoms other than those designated as backbone or capping atoms are
considered to be sidechain atoms.
20. Use the labels to check the known information about your new monomer.
! Set Label to AMBER7 FF99 Atom Types.
The absence of highlights indicates that all atoms have been typed with
AMBER7 FF99 atom types, first using information from the dictionary
followed by an SLN typer that uses rules located in $TA_DICT/AMB_PARMS.
Examine the AMBER7 FF99 atom types closely, especially those on the newly
added group. One hydrogen was typed incorrectly as H1 (H attached to aliphatic
C with one electron-withdrawing substituent) and is better typed as HC (H
attached to aliphatic C with no electron-withdrawing substituents).
To change the AMBER7 FF99 atom type for the atom labeled H1:
! Press Atom Types.
! Select AMBER7_FF99 and press OK.
! Click on the atom labeled H1 on the screen then press OK in the

dialog.
! Type or select HC as the new type and press OK.
The label for this atom confirms that its AMBER7 FF99 atom type was
changed.
21. Use the other labels to inspect your monomer.

! Explore the other labels for atom types and charges.
Note that the same hydrogen was typed incorrectly for the other AMBER force
fields. For Kollman force fields, highlights provide visual clues that some atom
types and atomic charges could not be assigned.
22. You may modify atom names, force field atom types and atomic charges by
using a spreadsheet.
! Press Edit Types and Charges in Table.

A spreadsheet by the name NEWMONOMER appears. Its rows contain the

atom IDs of the monomer, and its columns contain:
• NAME—atom name
• A02_T—AMBER7 F02 atom type
• A99_T—AMBER7 FF99 atom type
• A95_T—AMBER4.1 FF95 atom type
• KU_T—Kollman United atom type
• KA_T—Kollman All atom type
• A02_C—AMBER7 FF02 atomic charge
• A95_C—AMBER4.1 FF95 atomic charge
• KU_C—Kollman United atomic charge
• KUN_C—Kollman United atomic charge if the residue is the first in a
chain (N-terminal)
• KUC_C—Kollman United atomic charge if the residue is the last in a
chain (C-terminal)
• KA_C—Kollman All atomic charge
The spreadsheet shows the currently assigned values of these parameters.
UNK designate unknown atom types.
An informational dialog titled Finished Editing describes how to proceed.

! Scroll down the spreadsheet to examine the names of the atoms in
the NAME column.
You will now edit cell values in the spreadsheet.
23. The various AMBER and Kollman atom types were taken directly from the
template in the dictionary followed by assignment using the SLN typer. If an
atom type cannot be assigned by either of these methods it is entered as “UNK”
in the spreadsheet.
! For the atom named C11, type C3 in the KU_T and KA_T columns.
! For atom named H1 (row 23), type HC in the A02_T, A95_T, and
KU_T columns.
For a list of valid atom types see Force Fields for Biopolymers (in the Force
Field Manual).

24. The charges were taken directly from the SYBYL screen. If charges are not
assigned to the molecule before entering the Create Monomer dialog, Pullman
charges are assigned by default.
25. Close the spreadsheet.
! Press OK in the Finished Editing dialog.
26. Keywords associated with the residue.
! Look at the Class Keywords at the bottom of the dialog.
Keywords define the monomer class. The choices were taken automatically
from the template and consist of:
• amino_acid—The monomer is a protein residue
• standard—The monomer is one of the 20 standard amino acids
• default—This information is no longer used.
! In the field delete ,standard,default.
27. The monomer definition is complete. Create the residue file.
! Press OK to close the Create Monomer dialog.
The Monomer Information dialog informs you that the residue has been success-
fully created and stored in the file mle.res within your $HOME/dictionary
directory.
Although the monomer is now available for immediate use because it has been
automatically added to the dictionary that is currently open (in memory), it has
not yet been saved to the dictionary file. For that reason, the information dialog
also asks you whether to add the newly created monomer permanently to your
macromol dictionary. It is safe to do so, because it is very easy to modify a
monomer definition once it is in the dictionary.
! Read the message then press OK.
Your copy of the file macromol.dic was updated to include the MLE residue.
! Close the Custom Dictionary Management dialog.
Warning: Some properties of a new monomer are not accessible via the Create
Monomer dialog. In your own work you will need to inspect and edit your new
.res file to update properties such as molecular weight and improper torsions for
the AMBER and Kollman force fields. See Complete Verification of the
Residue File on page 245 for a list of recommendations.

2.4.5 Use the Newly Defined Monomer

28. The new residue, MLE, is available for immediate use. Check this by building
the peptide ALA-MLE-ALA.
! In the Build Protein dialog, click on ALA, then MLE, and then ALA.
! Select alpha_helix as the conformation and press Build.
The peptide is displayed with N-methyl leucine in the middle of the chain.
! Use to label the Atoms by Substructure.
29. This concludes this tutorial.
2.4.6 Permanent Setup and Maintenance of a Private Dictionary

To make SYBYL use your private dictionary in future sessions simply code the
necessary operations in your $HOME/sybyl.ini file (sample sybyl.ini file in the
Toolkit Utilities Manual). This text file contains SYBYL commands that are
executed when SYBYL starts (if the file is found in your home directory). Here
are prototype commands that set the Tailor variables needed to use the files
created in the Monomer Definition tutorial:
# Open my dictionary
setvar TAILOR!BIOPOLYMER!DIRECTORY $HOME/dictionary
setvar TAILOR!BIOPOLYMER!DEFAULT_DICT macromol
The steps above need only be done once. From then on, when you start SYBYL
you will be using your private dictionary.
You can modify these instructions to create a shared dictionary (as opposed to a
private one) by using an appropriate new directory location (outside of the
SYBYL tree but in some shared disk space) and setting the protections as
needed on the files and directories.
When a new version of SYBYL is released, changes and additions may have
been made to the dictionaries. We strongly encourage you to compare what you
are using with what is in any new release. This comparison would include the
*.res and *.dic files and a comparison of the energy values obtained while
using the different dictionaries.

3. Biopolymer Menu Layout
3.1 Main Biopolymer Menu

Menu Item Command
Prepare Structure
Structure Preparation
Tool
Add Hydrogens... BIOPOLYMER ADDH
Set Protonation Type... BIOPOLYMER PROTONATE
Load Charges... BIOPOLYMER LOAD CHARGES
Edit Termini... BIOPOLYMER POLY_BLOCK
Fix End Groups... BIOPOLYMER FIX_END_GROUPS
Fix SYBYL Atom BIOPOLYMER TYPE_COFACTOR
Types in Cofactor...
Fix SYBYL Atom
Types in Ligand...
Assign AMBER Atom BIOPOLYMER LOAD OTHER_ATOM_TYPES
Types...
Add Sidechains... BIOPOLYMER ADD_SIDECHAINS
Fix Sidechain BIOPOLYMER FIX_ASN_GLN
Amides...
Fix Prolines... BIOPOLYMER FIX_PROLINE
Chain Termini Sets... BIOPOLYMER SET TERMINI
Set Chain Names... BIOPOLYMER SET CHAINNAME
Renumber Sequence... BIOPOLYMER RENUMBER
Convert PDB Atom BIOPOLYMER CONVERT
Names
Build
Build Protein... BIOPOLYMER BUILD
Create Disulfide BIOPOLYMER DISULFIDE
Build C-alpha to BIOPOLYMER CONSTRUCT_BACKBONE
Backbone...
Build DNA Double BIOPOLYMER DNAHELIX
Helix...
Build DNA Strand... BIOPOLYMER BUILD

Main Biopolymer Menu
Build RNA Double BIOPOLYMER RNAHELIX

Helix...
Build RNA Strand... BIOPOLYMER BUILD
Build Carbohydrate BIOPOLYMER BUILD
Add Solvent/Cofactor BIOPOLYMER BUILD
Break Chain... BIOPOLYMER BREAK
Join Chains... BIOPOLYMER JOIN
Create Cycle... BIOPOLYMER CYCLE
Phosphorylate... BIOPOLYMER PHOSPHORYLATE
Composition
Protein Composition
Tool...
Replace Sequence... BIOPOLYMER REPLACE
Mutate Monomers... BIOPOLYMER CHANGE
Insert Monomers... BIOPOLYMER INSERT
Excise Monomers... BIOPOLYMER EXCISE
Delete Monomers BIOPOLYMER REMOVE
Conformation
Measure BIOPOLYMER MEASURE
Conformation...
Set Backbone BIOPOLYMER SET CONFORMATION
Conformation...
Find Secondary BIOPOLYMER FIND SEC_STR
Structure...
Predict Secondary BIOPOLYMER PREDICT_SECONDARY
Structure...
Set Sidechain BIOPOLYMER SET CONFORMATION
Conformation...
Scan Sidechain BIOPOLYMER FIX_SIDECHAINS
Torsions...
Copy Conformation... BIOPOLYMER COPY_CONFORMATION
Protein Loops BIOPOLYMER LOOP
Search PRODAT BIOPOLYMER LOOPS SETUP
Database...
Tweak Conformational BIOPOLYMER TWEAK
Search...
Analyze Search BIOPOLYMER LOOPS ANALYZE
Results...

Main Biopolymer Menu
Compare Sequences
Align and Write MSA... BIOPOLYMER ALIGN_SEQUENCES
BIOPOLYMER MULT_ALIGN_SEQ
View/Edit
Alignments...
List Sequence... BIOPOLYMER SEQUENCE
FUGUE (license requirement)
Compare Structures
Fit Monomers... BIOPOLYMER FIT
Align Structures By BIOPOLYMER ALIGN_STRUCTURES
Homology...
Local RMS Fits of
Conformers...
Find and Fit Fixed BIOPOLYMER RESIDUE_FIT
Regions...
Model Proteins
FUGUE (license requirement)
ORCHESTRAR (license requirement)
Create Quick Model... (license requirement)
Analyze Protein
Create ProTable...
SiteID Find Pockets...
SiteID Create Table...
Search Database...
Dictionary & Database
Admin
Open Dictionary... BIOPOLYMER DICTIONARY OPEN
Close Dictionary... BIOPOLYMER DICTIONARY CLOSE
Manage Custom BIOPOLYMER DICTIONARY CREATE MONOMER
Dictionary... BIOPOLYMER DICTIONARY CREATE DICTIONARY
BIOPOLYMER DICTIONARY ADD MONOMER
List Dictionary... BIOPOLYMER DICTIONARY LIST
Create/Update mkprodat utility
PRODAT Database
Sequence Viewer...

Biopolymer Functions on Other SYBYL Menus
3.2 Biopolymer Functions on Other SYBYL Menus

3.2.1 Biopolymer: Read and Save (on the File Menu)
File > Import File ( )

File Type: PDB PDB IN
File Type: Sequence BIOPOLYMER PIR IN
File > Export File ( )
Format: PDB PDB OUT
Format: AMBER PDB SETVAR TAILOR!PDB!OUTPUT_FORMAT AMBER
then PDB OUT
Format: FASTA BIOPOLYMER PIR OUT
Format: PIR BIOPOLYMER PIR OUT
File > Retrieve PDB PDB FTP
3.2.2 Biopolymer: Display (on the View Menu)

Only the menu items most relevant to biopolymers are listed here. For the full
list of items on the View menu refer to the SYBYL Reference Guide.
View > Protein View

Apply Default View PROTEINVIEW
Define View ( )
Reset View
Whole Molecule BIOPOLYMER DISPLAY
C Alpha Trace BIOPOLYMER DISPLAY
Sidechain Trace BIOPOLYMER DISPLAY
Backbone Only BIOPOLYMER DISPLAY
View > Surfaces and Ribbons MOLCAD RIBBON
> Quick Ribbons ( ) RENDER PROTEIN
View > Color by Scheme
Secondary Structure BIOPOLYMER COLOR BY_SECONDARY_STRUCT
Chain BIOPOLYMER COLOR BY_CHAIN
Property BIOPOLYMER COLOR BY_PROPERTY
Acid/Base BIOPOLYMER COLOR BY_ACID_BASE
Hydrophobicity BIOPOLYMER COLOR BY_HYDROPHOBICITY
B-Factors BIOPOLYMER BFACTORS

Nucleotide BIOPOLYMER COLOR BY_NUCLEOTIDE

View > Label ( )
Molecules
Atom Type MODIFY ATOM OTHER_TYPES
Substructure LABEL
Chain/Substructure
View > Show More ( )

View > Show Only ( )
View > Hide ( )
Within a radius
Biopolymer components
Hydrogens
3.2.3 Biopolymer: Selection (on the Selection Menu)

Only the menu items most relevant to biopolymers are listed here. For the full
list of items on the Selection menu refer to the SYBYL Reference Guide.
Items on the Selection menu do not map to any commands because there is no
command equivalent to a selection without any applied action.
Selection > Expand ( )

To Anything Connected
To Substructure
To Chain
To Biopolymer
To Structure
Within a radius
Selection >
Biopolymer components
3.2.4 Biopolymer: Minimize (on the Compute Menu)

The following features are described in the Force Field Manual.
• Local minimization of some residues in a protein:
• Compute > Minimize > Subset
• ANNEAL

• Minimization of an entire protein:

• Compute > Minimize> Staged Minimization

4. Read and Write Biopolymer Files
• Read and Write PDB Files on page 58
• Read PDB Files
• Write PDB Files
• Retrieve PDB Coordinates from Other Sources
• Protein Data Bank: Useful Links
• Read and Write PIR and FASTA Files on page 67

Read and Write PDB Files
4.1 Read and Write PDB Files

4.1.1 Read PDB Files
SYBYL recognizes the following file types as Protein Data Bank (PDB) files:
*.pdb, *.pdb.z, *.pdb.gz, *.ent, *.ent.z, *.ent.gz, pdb*.gz, *.atm, *.brk
Menubar: File > Import File ( )

• Set the Files of Type to PDB.
• Select the desired file.
• Specify the molecule area to receive the molecule
If a local copy of the Protein Data Bank is maintained at
your site, edit the TA_PDB line in the $TA_ROOT/lib/
environment file to include the full path. You will then
be able to open the directory defined by $TA_PDB and
retrieve the desired file in one of its subdirectories.
Command: PDB IN mol_area filename [type] center
[model_number]
Or, if a local copy of the Protein Data Bank is maintained
at your site, you can retrieve a file via:
PDB IN mol_area %system("pdbfname code")
[type] [center] [model_number]
• mol_area—Area to receive the molecule being read in
(current contents are overwritten).
• filename—Name of the PDB file to be read in.
• pdbfname—Script, written by Tripos, to locate a PDB
file at your site and return its complete pathname (see
pdbfname on page 260).
• code—Four character code (*) of the PDB file.
• type—Type of biopolymer in the file: PROTEIN, DNA,
RNA. Skipped if you have a consolidated SYBYL
license or a module-based Biopolymer license.
• center—Specify where to position the molecule
• model_number—ID number of model to load from file
containing multiple conformations separated by
MODEL records.
See pdbfname on page 260 for details of configuration and
use.
Warning: Some PDB files include multiple conformations of one or more

residues (flagged in the occupancy columns of the atom records). However,
SYBYL reads only the first instance of multiple conformation records.

Additional Information:
Retrieve PDB Coordinates from Other Sources on page 64
Activities Upon Reading a PDB File
Protein View
If you have a default protein view defined it will be automatically applied to
any structure containing a protein component when it is read in from a file in
PDB format. See Define and Apply Protein View Settings on page 72.
Molecule Name
By default, the molecule is given the name of the PDB file. This behavior is
controlled by Tailor variable PDB MOLNAMERULE.
Substructure Names
A substructure’s name is derived from the chain name, the residue name, and
the residue number in the PDB file. For example: A/GLU4.
If a residue name includes a number (such as ZN2) or if the residue number is a

negative number (such as -1), an underscore is inserted in the substructure
name. If the residue name ends with a number and the residue number is
negative, two underscores are inserted in the corresponding substructure name.
Examples:
Residue Name and Number Substructure Name
ALA 1 ALA1
ALA -1 ALA_1
ZN 2 ZN2
ZN 21 ZN21
ZN2 1 ZN2_1
ZN2 -1 ZN2__1
Z2A 1 Z2A1
Z2A -1 Z2A_1
By default, residue information found in the PDB file is stored. This infor-
mation is used to convert between PDB, SYBYL, and FlexX substructure
naming conventions. This behavior is controlled by Tailor variable PDB
RETAIN_PDB_SUBSTINFO.

HETATM Residues
HETATM records for a modified amino acid with identifiable backbone atoms
are reported (in the console) as “modified residue” if the residue name in the
PDB file does not match any of the residues in the dictionary. These modified
residues are stored in the {HETATM} sets and appear as “X” in the single-letter
sequence. Modified residues are treated as regular residues in the biopolymer
chain and behave as all known residues in biopolymer operations such as ribbon
display and residue mutations.
Cofactors
Cofactors are treated the same way as regular monomers if the residue name in
the PDB file’s HETATM record matches the name of a cofactor residue file in
the open dictionary. For a list, see Cofactors in the macromol Dictionary on
page 154. Cofactors recognized by SYBYL are stored in the {HETATM} set,
not in the {UNK_ATOMS} set.
Ligands
The macromol dictionary includes a ligand database (ligand_db.def) that is
based on information retrieved from the Ligand Depot site, a service associated
with RCSB. The ligand database and an additional database of chemical groups
greatly improve the SYBYL PDB reader's ability to assign correct atom and
bond types to most ligands. Atoms typed by the ligand database are stored in the
{LIGDB} set, which is included in the {HETATM} set. These atoms are not
stored in the {UNK_ATOMS} set.
Substructure and Atom Sets

Upon reading a PDB file, SYBYL creates several local sets. Note that these sets
are static, which means that their compositions are not updated after initial
creation unless one or all of their components are deleted from the molecule.
Read more about sets in the SYBYL Basics Manual.
LIGDB HETATM records that were typed using the ligand

database are stored in an atom set.
WATER HETATM records for water oxygens are stored in a sin-
gle substructure set.
HETATM HETATM records other than waters are stored in a sin-
gle substructure sets. These substructures include modi-
fied residues, ligands, cofactors, and metals.
HELIX_x_PDB HELIX records are stored in substructure sets, one set
per named helix.
SHEET_x_PDB SHEET records are stored in substructure sets, one set
per named sheet.

TURN_x_PDB TURN records are stored in substructure sets, one set

per named turn.
SITE_x SITE records are stored in substructure sets, one set per
named site.
UNK_ATOMS HETATM records for atoms that could not be typed via
the dictionary’s cofactor, ligand, and group databases
are stored in an atom set.
CHAIN_HEAD All N-terminal residues belonging to uniquely named
chains are stored in a single substructure set.
CHAIN_TAIL All C-terminal residues belonging to uniquely named
chains are stored in a single substructure set.
In the set names above, x is taken from the information in columns 12-14 of the
PDB file’s HELIX, SHEET, TURN, and SITE records.
For example, the following line in 1crn.pdb:

HELIX 1 H1 ILE A 7 PRO A 19 13/10 CONFORMATION RES 17,19 13
is interpreted by SYBYL’s PDB reader in the following manner:

Adding HELIX H1 from ILE7 to PRO19
and stored with the molecular description in a local substructure set named
HELIX_H1_PDB.
Similarly, the following lines in 4ins.pdb produce the single set name
SHEET_B_PDB:
SHEET 1 B 2 PHE B 24 TYR B 26 0
SHEET 2 B 2 PHE D 24 TYR D 26 -1 N PHE B 24 O TYR D 26
Water Molecules
Upon reading a PDB file, all atoms named HOH, H2O, WAT, and WTR are
treated as waters, given substructure names that begin with the string HOH, and
stored in the local set {WATER}.
To reorient existing waters use either of the following procedures:

• Delete the water hydrogens then add hydrogens to the waters.
• Load an SPL script then use the command it defines:
UIMS LOAD $TA_ROOT/tables/menubars/sybyl/biopolymer/add_h.core
BIOPOLYMERGUI!BIO_ORIENT_HYDROGENS mol_area
Atom Names
All spaces in atom names encountered while reading in a PDB file are automati-
cally converted to “_” .

Bonds
In a few PDB files, the interatomic distances in the backbone differ substan-
tially from standard values, causing SYBYL’s PDB reading functionality to
miss some connectivities and break the molecules in multiple chains. Use Tailor
variable PDB INTER_TOLERANCE to allow some deviation from standard
backbone bond lengths when assigning the connectivity.
After processing the content of the PDB file through the dictionary rules and
ligand database, the PDB reader adds bonds between atoms whose interatomic
distance is within a range of values (by default 1.0 to 1.8 Å). This operation
applies to all atoms in the PDB file and may result in extraneous bonds in
regions of poor geometry. The behavior is controlled by Tailor variable PDB
ADD_BONDS_BASED_ON_DISTANCE. To suppress this operation add the
following line in your $HOME/sybyl.ini file (sample sybyl.ini file in the
Toolkit Utilities Manual):
setvar TAILOR!PDB!ADD_BONDS_BASED_ON_DISTANCE NO
Some PDB files consist of unconnected alpha carbons. Bonds between

sequential residues can be added automatically with either of the following
procedures:
• Use the command BIOPOLYMER CONNECT_CA after reading such a PDB
file.
• Use Tailor variable PDB CONNECT_SEQ YES before reading such a PDB
file. The default value, NO, displays the molecule as a set of unconnected
dots, the alpha carbons.
Atomic Charges
PDB files only store formal charges on individual atoms. ATOM records in
PDB files include a field for temperature factors which SYBYL reuses to store
atomic charges when writing out PDB files. However, these values are inter-
preted as temperature factors when a file is read into SYBYL. To convert these
numbers into atomic charges, use the command CHARGE mol_area VALIDATE
YES. This must be done every time the file is read. A permanent alternative is to
save the molecule in the Mol2 file format.
Large Molecules
The standard PDB format supports up to 99,999 atoms: atom serial numbers
(which must be unique) are stored in columns 7-11. SYBYL’s PDB reader can
interpret PDB files with 100,000 or more atoms under the following conditions:
• The atom serial numbers have been shifted left (columns 6-11) or right
(columns 7-12).
• The atom name and subsequent fields in the ATOM and HETATM
records are in their standard positions (columns 13-80).

• CONECT records involving atoms above 99,999 have been deleted from
the input file.
4.1.2 Write PDB Files
Menubar: File > Export File ( )

• Set the Format to PDB.
• Enter the name of the output file (.pdb is appended
automatically)
• Select the desired molecule.
Command: PDB OUT mol_area filename
• mol_area—Area containing molecule being written out.
• filename—Name of the output file (.pdb is appended
automatically)
Activities Upon Writing a PDB File
Handling of special characters while writing PDB files:

• “ ' ” in hetero atom names (HETATM and ATOM records) are converted
into “ * ”.
• You can control how “_” in atom names is handled via Tailor variable
PDB OUTPUT_UNDERSCORE.
To conform with the PDB format, SYBYL includes the element type in columns
77-78 of the ATOM and HETATM records.
Use Tailor variable PDB to alter the parameters used when writing PDB files.
One of the variables allows you to write files in various PDB flavors, such as
PDB v.2.3, PDB v.3.1, or AMBER. Another variable determines whether to
include the chain names (by default, SYBYL does not write a chain designator
when a molecule has only one chain).
Warning: If the molecule was built with small molecule tools and includes a
PHENYL group, saving it to a PDB file shortens the phenyl’s substructure name
to PHE. When the file is read back in, the dictionary misinterprets this phenyl
group as a phenylalanine because of the substructure name. To avoid this
problem, edit the PDB file and change the substructure name.
Secondary Structure Information in Substructure Sets

Upon writing a PDB file, SYBYL converts the *HELIX_*, *SHEET_*, and
*TURN*_* sets into the HELIX, SHEET, and TURN records of the output PDB
file. This includes the secondary structure information from the original PDB
file as well as new information created by the various Biopolymer Confor-
mation dialogs and commands.

Large Molecules
The writing of PDB files with more than 99,999 atoms is not standard and is not
supported in SYBYL. RCSB’s recommendation is to split such molecules into
multiple PDB files. Alternatively, the .mol2 format can be used to store a large
molecule as a single unit.
4.1.3 Retrieve PDB Coordinates from Other Sources

Retrieve PDB structures from a variety of sources.
File > Retrieve PDB
PDB Code(s) Enter one or more 4-character PDB codes separated by

spaces or commas.
Retrieve From • RCSB Server—This option requires an internet
connection and uses the FTP protocol.
• Local Database (TA_PDB)—A local version of
the RCSB database identified in the $TA_ROOT/
lib/environment file by the environment variable
TA_PDB (edit the file to include the full path to your
local database on the TA_PDB line).
• PRODAT Database—Tripos-supplied binary
protein database, containing high-resolution protein
structures derived from the Protein Databank.

RCSB Server Access the RCSB Server Details dialog where you can
Details specify:
• RCSB Server Address—ftp.wwpdb.org
(PDB format 3.1; default) or ftp.rcsb.org (PDB
format 2.3).
• Path to Files at RCSB—Both formats use the
same path: pub/pdb/data/structures/
divided/pdb/
• FTP Idle Time—Idle connection time in seconds
(default = 30)
See Tailor variable PDB FTP for details.
Load Retrieved Whether to load the retrieved PDB file(s) into SYBYL
PDB File (irrelevant when retrieving from PRODAT).
Protein View
If you have a default protein view defined it will be automatically applied to the
retrieved structure if it contains a protein component. See Define and Apply
Protein View Settings on page 72.
Retrieve PDB from RCSB at the Command Line
PDB FTP code

• code—the 4-character code of a single PDB file
The requested file is retrieved and placed in the current working directory. To
load the structure into SYBYL use the PDB IN command.
• Activities Upon Reading a PDB File on page 59
• Tailor variable PDB FTP (in the Tailor Manual)
How to Test the Connection to the RCSB FTP Site
1. Retrieve the coordinates for the entry code 4pep.
! File > Retrieve PDB
! Type 4pep in the field.
! Set the rest of the dialog as follows (these are the defaults):
- Retrieve From: RCSB Server
- Load Retrieved PDB File: on
! Click Retrieve.

2. If the connection is successful, the file (.ent) corresponding to the specified

code is downloaded to your current working directory. The coordinates are also
loaded into the first available molecule area.
3. If the connection fails to retrieve the requested molecule, a message is displayed

in the console, and a dialog prompts you to retry. Initial failure may be due to
network traffic or configuration. The second or third attempt is usually
successful.
4. If you encounter persistent failure, explore the RCSB Server Details option in
the dialog or contact your system administrator.
4.1.4 Protein Data Bank: Useful Links

Home of the PDB: Research Collaboratory for Structural Bioinformatics
http://www.rcsb.org/pdb/
If you have any question about the PDB format, check the official PDB manuals
at:
http://www.wwpdb.org/docs.html
To validate the format of a PDB file:

http://deposit.pdb.org/validate/docs/tutorial.html
Literature reference:
H.M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T.N. Bhat, H. Weissig,
I.N. Shindyalov, P.E. Bourne, The Protein Data Bank. Nucleic Acids
Research, 28, pp. 235-242 (2000).

Read and Write PIR and FASTA Files
4.2 Read and Write PIR and FASTA Files

Read or write protein sequence files in the following formats.
• PIR format of the National Biomedical Research Foundation
http://www.bioinformatics.nl/tools/crab_pir.html
• FASTA
http://www.compbio.ox.ac.uk/bioinformatics_faq/
format_examples.shtml#fasta
PIR files are used in:

• Biopolymer to align sequences
• FUGUE and ORCHESTAR
4.2.1 Read PIR and FASTA Files

Read protein sequence files in PIR and FASTA formats.
Menubar: File > Import File ( )

• Set the Files of Type to Sequence.
• Select the desired file.
• Specify the molecule area to receive the molecule.
Command: BIOPOLYMER PIR IN mol_area filename
BIOPOLYMER FASTA IN mol_area filename
• mol_area—Area to receive molecule being read
(current contents are overwritten).
• filename—Name and/or directory path for the input
PIR sequence file (the default extension is .pir) or the
FASTA sequence file (the default extension is .fasta).
The sequence found in the file is used to construct a protein. The secondary
structure is assigned using the MAXFIELD_SCHERAGA (Bayes Statistics) method
(see Secondary Structure Prediction on page 319.)
The dictionary must be of type PROTEIN (protein or bigpro dictionaries) or

BIOPOLYMER (macromol dictionary) when reading a .pir file into SYBYL. If
another type of dictionary is already open, it will be closed and the macromol
dictionary opened.

4.2.2 Write PIR and FASTA Files

Write protein sequence files in PIR or FASTA formats.
Menubar: File > Export File ( )

• Set the Format to PIR or FASTA.
• Enter the name of the output file (the appropriate
extension is appended automatically: .pir or .fasta)
• Select the desired molecule.
Command: BIOPOLYMER PIR OUT source input_info file-
name
BIOPOLYMER FASTA OUT source input_info file-
name
• mol_area—Area containing the molecule being written
out.
• Source of the input sequence used to produce the
sequence file and information depending on the source:
KEYBOARD—Then enter a string of single letter amino
acid codes.
MOLECULE_AREA—Then enter the molecule area from
which the sequence will be derived.
PDB_FILE—Then enter the name of a file in any of the
PDB formats.
SEQUENCE_BUILDER—Then enter a sequence
expression of 3 letter codes separated by equal signs.
• filename—Name of the output PIR sequence file (the
default extension is .pir) or FASTA sequence file (the
default extension is .fasta)
Blocking groups are not included in PIR or FASTA files written by SYBYL.
If multiple chains are present, a separate file is created for each chain, the chain
names are appended to filename and the appropriate extension is added (.pir or
.fasta). For compatibility with the NBRF database, the filename may be the
unique retrieval key assigned to each entry in the PIR database.
The BIOPOLYMER PIR OUT command assumes that all structures or sequences
generate protein fragment files.
The file produced consists of three or more lines: a HEADER line, a TITLE
line, and one or more SEQUENCE lines. An example follows:
>F1;”file_base”
SEQUENCE NAME - ORIGIN NAME
AACDHEKLLLISSTRTLINEQWLLTTAKNLFL
VMPICLPSKDY*

The SEQUENCE NAME of the protein is assumed to be the name of the molecule
used as input, if any. Otherwise, this field and the ORIGIN NAME, are either
NONE or UNKNOWN. For completeness you may want to edit the files later to
modify these entries.
The sequence itself is a string composed of the one letter symbols for amino
acids. The sequence may span several lines, and is terminated with a single
asterisk.

This page intentionally blank.
5. Biopolymer Display
• Define and Apply Protein View Settings on page 72
• Typical Workflow on page 72
• Components of a Protein Complex on page 72
• Protein View Settings and Named Views on page 73
• Protein View Dialog on page 74
• PROTEINVIEW Command on page 80
• Simple Biopolymer Displays on page 81
• Display the C Alpha Trace
• Display the Backbone
• Display the Sidechains and C Alpha Trace
• Display the Whole Biopolymer
• Color Schemes for Biopolymers on page 83
• Biopolymer Ribbons on page 84
• Create Ribbons Quickly With a MOLCAD License
• Create Ribbons Quickly Without a MOLCAD License
• Create Ribbons via the RENDER Command
• Label Biopolymer Atoms on page 89
• Label Biopolymer Atoms via the Graphical Interface
• Label Biopolymer Atoms via the Command Line
• Ramachandran Graphs on page 91

Define and Apply Protein View Settings
5.1 Define and Apply Protein View Settings

Define and save view settings that include MOLCAD ribbons and surfaces as
well as properties. Saved definitions can then be applied when viewing other
proteins.
License Requirement:
See License Requirements for Biopolymer on page 8.
5.1.1 Typical Workflow

1. Load a molecule that has at least one protein element, such as a single
protein, or a protein-ligand complex, or a protein-DNA complex.
2. Prepare the molecule, including essential hydrogens if you intend to display
hydrogen bonds:
Biopolymer > Prepare Structure > Structure Preparation Tool
3. Access the Protein View Dialog:
View > Protein View > Define View or click on the Biopolymer
toolbar.
If multiple molecule areas contain biopolymers you will be prompted for the
molecule area containing the protein complex of interest.
4. Use the features in the dialog to define a view then click to name and
save the view. You may define several views.
5. If you have a favorite view, make it your default: select it and click .
The default view is automatically applied to all structures read in from files
in PDB format.
6. To apply the default view at any time to another molecule: View > Protein
View > Apply Default View.
5.1.2 Components of a Protein Complex

The various components of the protein complex are identified automatically for
use in the Protein View Dialog as follows:
• The ligand consists of the substructures that are not automatically
identified as protein, cofactor, metal or water and are also not carbohy-
drates ({BIOPOLYMER(LIGAND)}-{METAL}).
• The cofactor is identified by the built-in set
{BIOPOLYMER(COFACTOR)}.
• The metal atoms are identified by the built-in set {METAL}.
• The waters are identified by the built-in set {BIOPOLYMER(WATER)}.

5.1.3 Protein View Settings and Named Views

SYBYL does not include a defined default view. This means that when you read
in a PDB file all atoms are visible, displayed as lines and colored by atom type.
You may define your own settings and save then as named views, then select
one of those as your personal default view.
Default Settings in the Protein View Dialog
The default settings in the Protein View dialog are as follows:

• Protein: fully visible and in blue lines.
• Ligand: capped sticks colored by atom type
• Cofactor: cyan capped sticks
• Metals: green capped sticks
• Water: red capped sticks; only those within 5 Å of the ligand are visible
(if the structure does not include a ligand all waters are visible).
If you like these settings and would like to apply them to all PDB files when
you read them, save them in a named view and make it your default view.
In the absence of a user-defined default view you can apply these settings to any
protein complex: View > Protein View > Apply Default View.
How to Create and Save Personal Views

You need one protein complex. Choose a structure that is representative of
those in your area of interest. In particular, it should include components such
as ligand and waters if these elements will be part of a defined view that can be
applied to other protein structures.
• Load a protein or protein complex in a molecule area.
• Click to access the Protein View dialog.
• Use the features in the dialog to define the characteristics of a view.
• Click and type a name for the view. Use only alphanumeric
characters and underscores. Spaces will be automatically converted to
underscores.
All named views are saved in your $HOME/.sybyl/ProteinView directory.

Location of $HOME on Windows platforms:
• Windows XP: your Documents and Settings/user folder
• Windows 7: your Users/username folder

How to Make a View your Default View

A default view is a personal and named view explicitly identified as the default
view. Only one view can be identified as the default. This view is then automat-
ically applied to all protein complexes loaded into SYBYL from files in PDB
format. This view is never applied automatically to molecules read in from
Mol2 files.
To make any named view the default view:

• Load a protein or protein complex in a molecule area.
• Click to access the Protein View dialog.
• Select a named view in the list and click .
• This view is automatically moved to the top of the list and an asterisk
after its name identifies it as the default.
• The file $HOME/.sybyl/ProteinView/.defaultview contains the
name of your default view.
To remove the default property of a named view:

• Select the named view in the list and click .
• The named view returns to its position in the alphabetically sorted list.
To apply the default view at any time to any biopolymer structure:

• View > Protein View > Apply Default View
• In the absence of a named view identified as the default view, the default
settings in the dialog are applied.
How to Reset the Display of a Biopolymer

To reset the view of the biopolymer containing a protein component so that all
atoms are shown and colored by atom type, and all bonds displayed as lines:
• View > Protein View > Reset View
5.1.4 Protein View Dialog

Define, apply, and save view settings that include MOLCAD ribbons and
surfaces as well as properties.
View > Protein View > Define View

or click on the Biopolymer toolbar
• Required to enable the menu item and icon: a molecule that has at least
one protein element, such as a single protein, or a protein-ligand
complex, or a protein-DNA complex.

• If multiple molecule areas contain biopolymers you will be prompted for

the molecule area containing the protein complex of interest.
Usage Notes
When accessing the Protein View dialog:
• If a named view had been applied to the molecule, the settings in the
dialog are those of that view.
• Under all other circumstances the dialog’s default settings are applied to
the molecule. In addition, any surfaces and/or ribbons that were present
on the molecule before invoking the Protein View dialog are temporarily
hidden then restored when you close the dialog.
Protein Ligand Complex
Source Access a browser to select the molecule area of interest.

The selection is echoed in the dialog.
Ligand Access a browser to specify a separate molecule area
for the ligand if the source molecule area does not con-
tain the ligand.

Components Components of the molecule are identified automati-

cally. These are:
• Protein—All residues in protein, RNA, and DNA
chains, including modified amino acids.
• Ligand—All substructures that are not automati-
cally identified by the Protein, Cofactor, Metal, and
Water categories and are not carbohydrates. This
component must be present to enable Protein
Color by Distance in the upper right of the
dialog.
• Cofactor—All substructures that match cofactor
residues defined in the dictionary.
• Metal—All atoms with a metallic ordinal number
(according to the periodic table), including metal
atoms in cofactors.
• Water—All water substructures.
To modify the definition of any component, click its
[...] button and identify the desired substructures.
Rendering Style Anti-Aliased Lines
Capped Sticks
Ball & Stick
Spacefill
Hide
Color Scheme • Any of the 24 SYBYL colors
• By Atom Type—Note that this color scheme is
used by default when applying to a protein that does
not include a ligand a view whose settings depend
on a ligand.
• By Distance—Color the “Protein” component by
the distances to the ligand as defined by the sliders
in the upper right corner of the dialog. Available
only if the structure includes a ligand.
View Sets
Auto Load Whether to apply the view selected in the list upon
Selected View selecting it. This feature is on by default and is used
only within the dialog. Toggling it off makes it easier to
select a view in the list when the purpose of the selec-
tion is to delete the view.
Defined Views List of all the view settings saved in your $HOME/
.sybyl/ProteinView directory. Select one to apply it to
the current protein complex. The selected view will
remain highlighted in the list until you change any of
the settings in the dialog.

Define the selected view settings as the default. The

default view is marked with an asterisk and moved to
the top of the list. Its name is stored in the $HOME/
.sybyl/ProteinView/.defaultview file. Deleting the
default view will also delete this file.
To remove the “default” property of a view, select it in
the list then click .
Save a definition of the current view in the user’s
$HOME/.sybyl/ProteinView directory. The name of
the file must consist of alphanumeric characters (under-
scores are allowed, but spaces are not).
Rename the selected view.
Delete the selected view from the saved list. Its associ-
ated file will be deleted.
Apply This button is enabled only when Auto Load
Selected View above the list is off.
Reset Applies the dialog’s default settings.
Protein Color by Distance
Protein Color By Toggle this check box to color the Protein component
Distance by distance ranges to the ligand as specified by the
region sliders and color options below. This option is
disabled if the structure does not include a defined
ligand.
Region Sliders All sliders range between 1 and 20 Å. They are active
only when the Protein is colored by distances.
• Near—All residues that have at least one atom
within the specified distance of any “Ligand” atom.
• Mid—All residues beyond those in the Near region
that have at least one atom within the specified
distance of any “Ligand” atom.
• Far—All residues beyond those in the Mid region
that have at least one atom within the specified
distance of any “Ligand” atom.
• Hide—Undisplay all residues that have at least one
atom beyond the specified distance of any “Ligand”
atom.
Color Color the “Protein” atoms within the region by any of
the 24 SYBYL colors or by atom type.

Surfaces
Component Select the component for which a MOLCAD surface

will be generated: Active Site, Ligand, Cofactor,
Metal, Water or None.
The extent of the Active Site surface around the ligand
is determined by the position of the Mid slider when
the surface is generated.
Display Style • Opaque
• Transparent
• Lines
• Dot
• Hide—Undisplay the surface
• None—Do not create a surface or delete it if one
has been created. If you change the position of the
Mid slider after displaying a Protein surface, you
must set the surface’s style to None then recreate it.
Color Scheme Color the surface:
• By any of the 24 SYBYL colors
• By Atom Type
• By Atom Color—reflects the color scheme set in
the dialog for the selected component.
• By any one of the many MOLCAD properties (read
about properties and color ranges in the MOLCAD
Manual). If you select the electrostatic potential
property the atomic charges will be computed and
the selected charge method will be saved with the
view settings.
Ribbon
A cartoon-style MOLCAD ribbon is created for the “Protein” component. The
shape of the ribbon represents the secondary structure elements: helices for
helical regions; curved, directional arrows for beta strands; and tubes for the
remaining residues.
Display Style • None—Do not create a ribbon.

• Opaque
• Transparent
• Line (mesh)
• Dot
• Hide—Undisplay the ribbon.

Color Scheme Color the ribbon by:

• —By Secondary Structure (the default)
• Any of the 24 SYBYL colors
• Any one of the following MOLCAD properties:
atom flexibility, residue flexibility, lipophilic
potential, or packing density (read about properties
and color ranges in the MOLCAD Manual).
Hydrogen Display Options

Hydrogens must be present on the molecule for these features to have an effect.
Hydrogens cannot be added to the structure from within this dialog.
H-Display Which hydrogens to display:

• All—Display all hydrogens connected to visible
heavy atoms. These hydrogens are identified by
{H_CONN_VIS_HEV(*,ALL)}
• Off—Hide all hydrogens.
• Polar—Display only the hydrogens that can partic-
ipate in hydrogen bonds and that are connected to
visible heavy atoms. These hydrogens are identified
by {POSSIBLE_HBOND(*,HYDROGEN)}&
{H_CONN_VIS_HEV(*,ALL)}
• Non-Polar—Display only the hydrogens that can
not participate in hydrogen bonds and that are
connected to visible heavy atoms. These hydrogens
are identified by {H_CONN_VIS_HEV(*,ALL)}-
{POSSIBLE_HBOND(*,HYDROGEN)}
Display H-Bonds Whether to display hydrogen bonds (as yellow dashed
lines) between the visible “Protein” residues and the
other components of the molecule.
General Display Options
Label Substruc- Whether to label the residues in the Near and Mid
tures regions.
Bond Line Width Thickness of the lines representing bonds in the “Pro-
tein.” The value ranges from 1 to 5, with a default of 1.

5.1.5 PROTEINVIEW Command

Apply the default protein view to the contents of the specified molecule area.
PROTEINVIEW mol_area
The menubar equivalent is: View > Protein View > Apply Default View
User-defined views are stored in the $HOME/.sybyl/ProteinView. The name

of the default view is stored in the $HOME/.sybyl/ProteinView/
.defaultview file.
In the absence of a named view identified as the default view, the default
settings in the Protein View dialog are applied.
View settings are defined in the Protein View Dialog on page 74.

Simple Biopolymer Displays
5.2 Simple Biopolymer Displays

5.2.1 Display the C Alpha Trace
Display a biopolymer of any type by drawing a single line between connected
residues. The visible atoms are connected by dummy bonds. For a protein, this
is known as the alpha carbon trace.
Menubar: View > Protein View > C Alpha Trace

Any ligands and cofactors are also displayed.
Any metals and waters are hidden.
Backdrop right- Molecule Display > C Alpha Trace
click: Any ligands and cofactors are also displayed.
Atom right-click: Any metals and waters are hidden.
Command: BIOPOLYMER DISPLAY mol_expr CA_ONLY
Warning: Chain trace display is intended for viewing purposes only. Many
SYBYL commands will not work correctly in this display mode.
An alternative at the command line is:

BIOPOLYMER TRACE mol_area color display_file
The visual effect is similar. However, the molecular display is unaffected, and
the trace is represented as a background image stored in a display file
(trace.dsp by default). Read about the handling of background images in the
Graphics Manual.
Known Limitation:
If the molecule has rotatable bonds defined, you may see artifacts in the C-alpha
display. If you see multiple C-alpha traces, type the following commands in the
console:
TWIST FREEZE mol_area
TWIST STATUS INACTIVE mol_area
You should now see the single C-alpha trace. After returning to the whole
molecule display, reactivate the rotatable bonds:
TWIST STATUS ACTIVE mol_area
• Biopolymer Ribbons on page 84
• Line Ribbon Display of Biopolymers on page 88

Simple Biopolymer Displays
5.2.2 Display the Backbone

Display a biopolymer’s backbone atoms.
Menubar: View > Protein View > Backbone Only

Backdrop right- Molecule Display > Backbone Only
Command: BIOPOLYMER DISPLAY mol_expr BACKBONE
5.2.3 Display the Sidechains and C Alpha Trace

Display a biopolymer’s sidechain atoms and alpha carbon trace.
Menubar: View > Protein View > Sidechain Trace

Backdrop right- Molecule Display > Sidechain Trace
Command: BIOPOLYMER DISPLAY mol_expr SIDECHAIN_CA
Warning: Chain trace display is intended for viewing purposes only. Many
SYBYL commands will not work correctly in this display mode.
5.2.4 Display the Whole Biopolymer

Display the entire molecule: all biopolymer atoms, ligands, cofactors, metals,
and waters.
Menubar: View > Protein View > Whole Molecule

Backdrop right- Molecule Display > Whole Molecule
click: All ligands, cofactors, metals, and waters are also dis-
Atom right-click: played
Command: BIOPOLYMER DISPLAY mol_expr
WHOLE_MOLECULE
or, more simply, but for a single molecule:
DISPLAY mol_area *

Color Schemes for Biopolymers
5.3 Color Schemes for Biopolymers

Any color and variety of color schemes can be applied to atoms. Read about all
color schemes and how to apply them in the Graphics Manual.
• Color by atom type
• Color by spectrum
• Color atoms by named color
Several color schemes are available to color biopolymers. Applicability of each

color scheme depends on the type of biopolymer: all types, proteins, or DNA/
RNA.
Access: View > Color by Scheme or use .

• Schemes applicable to any biopolymer:
• Chain—Apply a different color to every chain. Ligands, cofactor,
metal atoms are colored by their chain affiliation.
• B-Factors—Color atoms on a 6 color gradient to reflect the
magnitude of the temperature factors.
• Schemes applicable only to proteins:
• Secondary Structure—Color a protein by defined secondary
structure elements.
• Property—Color a protein by structurally important features.
• Acid/Base—Color a protein by acidic/basic property (at neutral pH)
and polar/non-polar property.
• Hydrophobicity—Color a protein by hydrophobic property.
• Scheme applicable only to DNA and RNA:
• Nucleotide—Color a biopolymer of type DNA or RNA by type of
nucleotide.

Biopolymer Ribbons
5.4 Biopolymer Ribbons

Quickly display a ribbon on a biopolymer backbone.
• Create Ribbons Quickly With a MOLCAD License
• Create Ribbons Quickly Without a MOLCAD License
• Create Ribbons via the RENDER Command
5.4.1 Create Ribbons Quickly With a MOLCAD License

! Selection > Backbone or select only those atoms of interest.
! Choose one of the following methods:

- Use .
- Right-click one of the selected atoms > Quick Ribbons
- View > Surfaces and Ribbons > Quick Ribbons
! Choose the type of ribbon to display.
! Clear the atom selection.
! To change the ribbon’s appearance, right-click it and specify the color

and style of your choice.
Choices of ribbons are:

• Ribbon—A ribbon-like surface whose shape remains constant as it
follows the curve of the biopolymer backbone.
• Tube—A tube-like surface whose shape remains constant as it follows
the curve of the biopolymer backbone.
• Snake—A snake-like surface similar to the ribbon, but with smoother
edges.
• Cartoon—The shape of the curve varies to reflect the secondary
structure: a tube connects the alpha helical regions represented as helices
and the beta strands shown as curved arrows.
• Secondary Structure—A ribbon-like surface representing only the
helices and beta strands.
For all styles of ribbons DNA base pairs are symbolized by polygons.
Working with Ribbons in the MOLCAD Manual

Biopolymer Ribbons
5.4.2 Create Ribbons Quickly Without a MOLCAD License

! Selection > All Atoms or select only those atoms of interest.
! Choose one of the following methods:

- Use
- Right-click one of the selected atoms > Quick Ribbons
- View > Surfaces and Ribbons > Quick Ribbons
! Choose the type of ribbon to display.
! Clear the atom selection.
Choices of ribbons are:

• Ribbon/Tube—The shape of the curve is varied to include the
secondary structure: a tube connects the alpha helical regions repre-
sented as helices and the beta strands shown as curved arrows. See the
equivalent command below.
• Shaded Ribbon—A ribbon-like surface whose shape remains constant
as it follows the curve of the biopolymer backbone. See the equivalent
command below.
• Tube—A tube-like surface whose shape remains constant as it follows
the curve of the biopolymer backbone. See the equivalent command
below.
• Line Ribbon—A set of five parallel lines following the curve of the
biopolymer backbone. See the equivalent command below.
5.4.3 Create Ribbons via the RENDER Command

The method for tracing the curve of the biopolymer backbone is based on
published work:
[1] Carson, M. and Bugg, C.E. “Algorithm for ribbon models of proteins”,
J. Mol. Graph., 4, 121-122 (1986)
[2] Carson, M. “Ribbon models of macromolecules”, J. Mol. Graph., 5,
103-106 (1987).
The protein backbone can be rendered in various ways:

• Render as a Combination of Ribbons and Tubes
• Render as Ribbons Only
• Render as Tubes Only
• Line Ribbon Display of Biopolymers

Biopolymer Ribbons
Refer to Tailor subject RENDER for customization of the ribbons produced by

the commands below.
Render as a Combination of Ribbons and Tubes
The shape of the rendered image reflects the protein’s secondary structure:
alpha helical regions are rendered as ribbons or cylinders, beta strands as curved
arrows, and the remainder of the biopolymer as a curved tube.
RENDER PROTEIN COMBINATION sequence_expr color
sequence_expr The sequence to render.

Note: The command line can not interpret a sequence
represented by a static set of substructures. However
this operation works in the Sequence dialog.
color • One of the 24 SYBYL colors. (See the Color Editor
in the SYBYL Reference Guide.)
• BY_ATOM_COLOR—Color the surface of each
residue the same color as that residue’s alpha-
carbon atom.
• BY_SECONDARY_STRUCTURE—3 colors distinguish
different types of secondary structure within the
protein: alpha helices, beta sheets and everything
else. The alpha helices can be displayed as cylinders
or ribbons. See Tailor subject RENDER.
Regions of secondary structure in the protein are determined by the presence of

substructure sets whose names begin with HELIX or SHEET. These sets are:
• automatically defined when reading in a PDB file (see that contains
secondary structure information (see Substructure and Atom Sets on
page 60).
• or created within SYBYL: see Find Secondary Structure Conformation
on page 177 and Assign Secondary Structure on page 181.
If multiple substructures sets have been defined for secondary structure

elements, you may specify which ones should be used to render the protein.
At the command line: SETVAR TAILOR!RENDER SEC_STR_SRC value
Usage Note: If no secondary structure sets are present, the menubar approach
will create them automatically by running the command BIOPOLYMER FIND
SEC_STR and using the Kabsch-Sander method (see Find Secondary Structure
via the Command Line on page 179).

Biopolymer Ribbons
Render as Ribbons Only
Render a biopolymer as a ribbon-like surface whose local shape remains

uniform as it follows the curve of the backbone.
RENDER PROTEIN RIBBON sequence_expr color

carbon atom.
else. See Tailor subject RENDER.
Render as Tubes Only
Render a biopolymer as a tube-like surface whose local shape remains uniform

as it follows the curve of the backbone.
RENDER PROTEIN TUBE sequence_expr color

carbon atom.
else. See Tailor subject RENDER.

Biopolymer Ribbons
Line Ribbon Display of Biopolymers
Draw a multi-line ribbon through all or part of a biopolymer backbone.
BIOPOLYMER RIBBON sequence nstrands color
sequence Residue sequence(s) in the molecule to draw the rib-

bon through; can consist of one or more sub-sequences
nstrands Number of strands for the ribbon
color Color to draw the ribbon

Label Biopolymer Atoms
5.5 Label Biopolymer Atoms

5.5.1 Label Biopolymer Atoms via the Graphical Interface
Access the label functionality:
• View > Label
•
• Right-click on an atom or the SYBYL backdrop
•
Labels may be applied to individual atoms or to all relevant atoms in the whole
molecule.
Labels on Atoms look- Labels on Molecules

aside menu look-aside menu
To remove the labels:

• View > Unlabel
•
• Right-click on an atom or the SYBYL backdrop
•

Label Biopolymer Atoms
5.5.2 Label Biopolymer Atoms via the Command Line

BIOPOLYMER LABEL_ATOMS {atom_expr} remove?
atom_expr One or more atoms to be labelled. The command iter-

ates over this argument until you type the end loop
character (|) or press Cancel in the Atom Expression
dialog.
remove? Whether to remove the labels displayed during the cur-
rent operation (labels previously displayed are not
removed).
The detailed atom name is displayed as a label on the atom, and further infor-
mation relating to that atom is listed in the console.
For example:
• the label A/ALA2.N means that the atom is a nitrogen (from the amide
backbone) from the second residue, an alanine, in the peptide chain
labelled A.
• The message appearing in the console provide additional information:
the molecule is in molecule area M1, has the SYBYL atom type N.am,
and the SYBYL atom ID 9:
Mol: M1 A/ALA2.N : N.am Atom id: 9
The labels created by this command are displayed as background images. To

delete labels not removed by this command, use Edit > Delete >
Backgrounds: (Surface, Ribbon, etc.) (or BACKGROUND DELETE) and
select entries with the type “No Attribute.”

Ramachandran Graphs
5.6 Ramachandran Graphs

Display a Ramachandran graph, a plot of phi versus psi angles for a protein, or
any two conformational angles for any class of biopolymer.
Note: You may find it more convenient to display Ramachandran graphs from a
ProTable spreadsheet.
BIOPOLYMER OLD_RAMACHANDRAN mol_area disp_area

{command}
mol_area Molecule area containing the molecule to plot.

disp_area Where to display the graph.
command Use any of the following options in any order. Exit the
command by typing |.
• GRAPH—Creates and display a Ramachandran graph
for a selected molecule. In command mode, only
display of the graph is allowed. In menu and object
picking modes, picking a point on the graph causes
the corresponding residue in the molecule to be
highlighted, and the angle values to be displayed.
Press End Select in the small Pick dialog to
terminate picking.
• LIST_DEFAULT_COLORS—View colors for the
points in the Ramachandran graph. Only those
colors that have at least one residue assigned to
them are displayed. The Tripos default colors are
green for proline, red for glycine, and white for all
others.
• SET_DEFAULT_COLORS—Set colors for the points
in the Ramachandran graph. You will be prompted
for pairs of monomer types and colors. End the loop
by typing |.
Tailor subject RAMACHANDRAN

6. Prepare Biopolymer Structure
• Protein Preparation Tool on page 94
• Add Hydrogens on page 99
• Set the Protonation Type on page 101
• Load Charges on page 104
• Edit Termini on page 108
• Fix End Groups on page 112
• Fix SYBYL Atom Types in Cofactor on page 113
• Fix SYBYL Atom Types in Ligand on page 115
• Assign AMBER Atom Types on page 117
• Add Sidechains on page 125
• Fix Sidechain Amides on page 128
• Fix Prolines on page 130
• Chain Termini Sets on page 131
• Set Chain Names on page 132
• Renumber a Sequence on page 133
• Convert PDB Atom Names on page 134
• Check Biopolymer Geometry on page 135
• Convert an External Mol2 File to a Biopolymer on page 137
• Convert a Small Molecule to a Biopolymer on page 138
• Minimize Biopolymer Structures on page 139

Protein Preparation Tool
6.1 Protein Preparation Tool

Clean up a the protein component of a molecule read in from a PDB file and
prepare it for force field calculations. Although designed for proteins, the
functions in the dialog can treat the DNA or RNA components of a protein
complex with the exception of replacing backbone atoms or relieving sidechain
bumps.
Caveat: If the structure does not contain at least one standard amino acid, the
analysis cannot proceed.
Workflow: If the PDB file contains a ligand and/or cofactor we strongly

recommend that you assign proper SYBYL atom types to the small molecules
before proceeding with the protein preparation. For details see:
• Fix SYBYL Atom Types in Cofactor on page 113
• Fix SYBYL Atom Types in Ligand on page 115
Biopolymer > Prepare Structure > Structure Preparation Tool

Analysis
Molecule Select the molecule area containing the molecule of

interest.
Extract Ligand Access the Extract Ligands dialog where you can select
Substructures among all the substructures that do not belong to the
{SEQUENCE(*)} or {WATER} sets. The selected sub-
structure(s) will be removed from the protein, placed in
another molecule area, and automatically assigned the
name “ligand_from_” followed by the name of the
source protein.
This option is disabled if the dialog is accessed from
the ORCHESTRAR - Analyze Model dialog.
Remove Sub- Access a dialog where you can select waters and any
structures other substructures, such as cofactors and metals to be
deleted. This feature is particularly useful when prepar-
ing a protein for docking with Surflex-Dock.
This option is disabled if the dialog is accessed from
the ORCHESTRAR - Analyze Model dialog.
Analyze Perform an analysis of the entire structure, populate the
Selected Struc- fields in the dialog, and store identified problems in
ture atom and substructure sets.
Detailed results of the analysis are printed to the file
bio_analysis.txt in the current directory. This file is
reused. To save it, rename it or copy it to another loca-
tion.
Tailor variable BIOPOLYMER ANALYSIS_OUTPUT
determines the type of information printed in the con-
sole.
Repairs and Additions

Use the Show buttons to highlight and label the atoms or residues reported by a
single category in the analysis.
Rename Atoms Reports the number of atoms whose names do not

match those in the dictionary’s residue files. These
atoms are stored in the atom set {X_ATOM_NAMES}.
Fix—Cycles through the atoms in the set and prompts
for the new atom name. The modified atoms are moved
to the atom set {FIXED_NAMES}.
See the note below about hydrogen treatment.
Follow-up analyses: atom types, backbone, sidechains

Repair Back- Reports the number of residues missing one or more

bone backbone atoms. These residues are stored in the sub-
structure set {X_BACKBONE}. Note that residues
with more backbone atoms than the number found in
the dictionary’s residue file are not flagged by this anal-
ysis.
Fix—Replaces residues that have missing backbone
atoms by the matching type taken from the dictionary.
Disulfide bridges are preserved during this operation.
The modified residues are moved to the substructure set
{FIXED_BACKBONE}.
Follow-up analyses: hydrogens, atom types, sidechain
bumps.
Repair Reports the number of residues missing one or more
Sidechain sidechain atoms. These residues are stored in the sub-
structure set {X_SIDECHAIN_ATOMS}. Note that
residues with more sidechain atoms than the number
found in the dictionary’s residue file are not flagged by
this analysis.
Fix—The molecule is copied into the next available
work area. In that copy, all incomplete sidechains are
replaced by the ones found in the dictionary’s residue
files (see Mutate Monomers on page 168). If prolines
residues were included, their geometry is fixed (see Fix
Prolines on page 130). The Set Sidechain Conformation
dialog is then posted to help you orient the sidechain in
each of the fixed residues. The modified residues are
moved to the substructure set
{FIXED_SIDECHAIN_ATOMS}.
Tailor variable BIOPOLYMER BUILD_HYDROGENS
determines if or what hydrogens are added to the
repaired sidechains.
bumps.
Termini Treat- Reports the number of terminal residues with incom-
ment plete treatment. These residues are stored in the sub-
structure set {X_XTERMINI}. Note that there are two
terminal residues per named chain in the molecule.
Fix—Displays the Edit Termini dialog where you can
add the desired capping atoms or blocking groups. The
modified residues and capping/blocking groups are
stored in the substructure set {FIXED_TERMINI}.
bumps.

Add Hydrogens Reports the number of residues missing at least 1

hydrogen. These residues are stored in the substructure
set {X_HYDROGENS}.
Add—Displays the Add Hydrogens dialog where you
can select the hydrogens to add: All or Essential. If you
choose Essential, the non-essential hydrogens will con-
tinue to be reported as missing. The residues with a full
complement of hydrogens are moved to the substruc-
ture set {FIXED_HYDROGENS}.
See the limitations notes below.
Follow-up analyses: atom types, sidechain bumps.
Set Protonation No analysis is performed.
Type Fix—Displays the Set Protonation Type dialog where
you can specify the protonation state of any ASP, GLU,
HIS, GLN or ASN residue.
Type Atoms Reports the number of atoms for which the AMBER
and Kollman atom types could not be found in the open
dictionary. Typically, these atoms belong to ligands,
cofactors, and metals. These atoms are stored in the sets
{UNK_AMBER7_FF99}, {UNK_AMBER7_FF02},
{UNK_AMBER95_ALL}, {UNK_KOLL_ALL}. The
union of these sets, {X_ATOM_TYPES}, includes all
atoms that lack at least one of the AMBER/Kollman
atom types. This is the number reported in the dialog.
Lone pairs are excluded from the analysis and are han-
dled automatically by the force field calculations.
Fix—Displays the Assign AMBER Atom Types dialog
where you can use an SLN atom typer to assign the
desired atom types and select the Expert Options to
assign any missing types manually.
Follow-up analysis: none
Add Charges Add—Displays the Load Charges dialog where you
can specify the charge sets to be used for the biopoly-
mer and for the ligand, waters and metals (if any).
Atoms for which the specified charges could not be
assigned are stored the atom set {ZERO_CHARGES}.
Note that assigning charges to the molecule overwrites
any existing temperature factors (B-factors). A dialog
will warn you and request confirmation before proceed-
ing with the charge calculation. The molecular descrip-
tion can store only one set of charges. Computing a
new set of charges discards the previous values.

Fix Sidechain Fix—Orient the sidechain amides of ASN and GLN

Amides residues in the direction of maximal potential hydrogen
bonding. See Fix Sidechain Amides on page 128.
Follow-up analysis: sidechain bumps.
Fix Sidechain Reports the number of residues whose sidechains are
Bumps involved in van der Waals bumps (an overlap of 1.2 Å
is tolerated). The procedure loops over all sidechain
atoms and checks for bumps with substructures that do
not contain the current atom. Bumps to water molecules
are ignored if the sidechain does not also bump into other
residues. The residues containing bumps are stored in
the substructure set {X_SIDECHAIN_BUMPS}.
Fix—Displays the Set Sidechain Conformation dialog
with the selection loaded.
Staged Minimi- Perform—Minimize the strain energy of a protein in
zation successive stages, from outside-most atoms (hydro-
gens) inward. See Staged Minimization in the Force
Field Manual.
Limitation of Hydrogen treatment in the Prepare Protein Structure dialog:

If a PDB file already has hydrogens on charged end groups (from NMR or high
resolution structures), but is not using the SYBYL blocking group residue
names (AMN and CXL), the analysis will consider these hydrogens as
misnamed. In this case you must close the dialog, remove the three hydrogens
on the nitrogen of the N-terminal residue, then reopen the dialog and rerun the
analysis. The termini will be flagged as needing to be treated and can be fixed
by adding the charged eng groups via the Edit Termini dialog.
Tip on Saving a Prepared Protein to the Mol2 Format:

When preparing a protein all new atoms, such as missing sidechains atoms and
added hydrogens are appended to the end of the SYBYL atom list. To guarantee
that all atoms (backbone, sidechain, hydrogens) are grouped by residue when
saving the prepared protein to a Mol2 file proceed as follows:
1. Save the protein in PDB format.
2. Read the saved PDB file.
3. Save the newly read molecule in Mol2 format.

Add Hydrogens
6.2 Add Hydrogens

Add hydrogen and sulfur lone pair atoms.
• Hydrogens added to the biopolymer are assigned name and charge as
defined in the residue files corresponding to the open dictionary.
Note: The positions of the hydrogens added to the biopolymer are deter-
mined by data in SYBYL’s internal tables for bond lengths and angles,
not by the positions within the residue files. Because, in general, a
residue’s heavy atoms have a different geometry once they are incorpo-
rated into a biopolymer, the hydrogen positions do not match those in
the residue file and must be recomputed by other means.
• Hydrogens are added to the ligand atoms excluding the following atom
types: O.co2, Li, Na, K, Ca, Mg, Al, Zn, Fe, Cu, Sn, Mo, Co.oh, Cr.oh,
Mn, Se, Cr.th
Adding hydrogens is a prerequisite to displaying hydrogen bonds.
6.2.1 Add Hydrogens via the Menubar

Biopolymer > Prepare Structure > Add Hydrogens
Molecule The molecule to which the hydrogens will be added.

This is already predetermined if you access this dialog
via the Structure Preparation Tool.
Hydrogens to All—Adds all hydrogens and lone pairs as defined for
Add each residue in the biopolymer dictionary.
Essential—adds only the hydrogens and lone pairs
defined in the dictionary as essential (typically those
atoms that have charges).
Orientation in Random—Add hydrogens to water oxygens to ran-
Waters domize their orientation.
H-bonding—Attempt to orient the water molecules to
favor hydrogen bonds.

Add Hydrogens
6.2.2 Add Hydrogens via the Command Line

BIOPOLYMER ADDH sequence which
sequence Residue sequence(s) to receive hydrogens

which ALL—Adds all hydrogens and lone pairs as defined for
each residue in the biopolymer dictionary.
ESSENTIAL—adds only those hydrogens and lone pairs
defined in the dictionary as essential (typically those
atoms that have charges).
If water molecules are included in the selection, their hydrogens are placed in a
random orientation.
6.2.3 Hydrogens on Water Molecules

If you want to reorient existing waters, load an SPL script then use the
command it defines:
UIMS LOAD $TA_ROOT/tables/menubars/sybyl/biopolymer/add_h.core
BIOPOLYMERGUI!BIO_ORIENT_HYDROGENS mol_area
6.2.4 Treatment of Hydrogens and Lone Pairs

A review and hydrogens and lone pairs is performed by default whenever a
force field calculation is invoked that involves AMBER or Kollman atom types
and charges. See Tailor variable FORCE_FIELD REVIEW_HS_AND_LPS.
Lone pairs on sulfur atoms are added or removed as required by the chosen
AMBER/Kollman force field.

Set the Protonation Type
6.3 Set the Protonation Type

Specify the protonation state of any ASP, GLU, HIS, GLN or ASN residue. The
protonation states set in this dialog are retained when the molecule is saved,
even if all hydrogens are removed.
6.3.1 Specify the Protonation State via the Menubar

Biopolymer > Prepare Structure > Set Protonation Type
or
In the Prepare Protein Structure dialog, click Fix on the Set Protonation
Type line.
Molecule Select the molecule area containing the molecule of

interest.
Manipulate Select the type of residues to appear in the list:
• All—All GLU, ASP, HIS, GLN, ASN, and TYR
residues.
• Acids (GLU/ASP)
• Histidines (HIS)
• Amides (GLN/ASN)
• Tyrosines (TYR)

Select Residue
Residue List The list is filtered by the Manipulate selection above

and the state of the List Only Residues Near
Ligands check box below.
For each residue in the list the following is reported:
ther residue’s substructure name, its protonation state
and whether the sidechain bon affecting potential
hydrogen bonds has been flipped.
Buttons to navigate through the list.
Pick Click Pick to identify a residue by picking any of its

atoms on the screen or by entering the residue name in
the small dialog. You may also type the substructure ID
number preceded by the # sign. (For example, ASP26
or #26).
List Only Resi- Toggle this on to limit the list to the residues of the
dues Near type specified by Manipulate above and that are
Ligands within 6 Å of any atom in the {HETATM} set. This set
includes ligands, cofactors, and metals identified when
the PDB file was read in.
Center the display of the molecule on the selected resi-
due and display only the substructures that are within
6 Å of the selected residue. The entire molecule will be
redisplayed when you Close the dialog.
Auto Center Toggle this on to automatically center the display of the
molecule on the selected residue and display only the
substructures that are within 6 Å of the selected resi-
due. This is particularly useful when using this dialog
to work on several residues. Toggle the option off to
redisplay the entire molecule.
Select Protonation Type
For Acids • Deprotonated (GLU/ASP)—Both oxygens have

the SYBYL atom type O.co2.
• Protonated (GLZ/ASZ)—Atom and bond types
are modified, and one of the oxygens is protonated.
To protonate the other oxygen click Flip 180
Degrees.

For Histidines • Delta (HID)—Protonate only the nitrogen in

position delta (ND1).
• Epsilon (HIE)—Protonate only the nitrogen in
position epsilon (NE2).
• Protonated (HIP)—Protonate both ring nitrogens
(default).
Orientation
Flip 180 Degrees Flip one sidechain bond as follows:

• ASP: CB-CG bond
• GLU: CG-CD bond
• HIS: CB-CG bond
• ASN: CB-CG bond
• GLN: CG-CD bond
• TYR: CZ-OH bond
6.3.2 Specify the Protonation Type via the Command Line

BIOPOLYMER PROTONATE residue_sel state
residue_sel As single residue of type GLU, ASP or HIS. Identify

the residue by substructure name (e.f. ASP26) or by
substructure ID number preceded by the # sign (e.g.
#26).
state • For ASP residues—ASP (deprotonated) or ASZ
(protonated).
• For GLU residues—GLU (deprotonated) or GLZ
(protonated).
• For HIS residues—HID (protonate ND1), HIE
(protonate only NE2) or HIP (protonate both ring
nitrogens).

Load Charges
6.4 Load Charges

Load atomic charges onto the selected atoms.
6.4.1 Load Charges via the Menubar

Procedure
• Any request to compute charges starts by deleting existing charges on
the entire molecule and setting new values to 0.0.
• Lone pairs on sulfur atoms are added or removed as required by the
chosen AMBER/Kollman force field.
• Charges for the biopolymer component of the molecule can be assigned
by a combination of the dictionary and the SLN typer or computed by
the SLN typer alone.
• All atoms for which charges can not be assigned or for which the charge
value is 0.0 are stored in the atom set {ZERO_CHARGE}. This exami-
nation is always done for the entire molecule.
Biopolymer > Prepare Structure > Load Charges
Molecule Area Select the molecule area containing the molecule of

interest.
Biopolymer Specify the type of charges to be loaded onto the pro-
tein atoms in the selected molecule area: AMBER7
FF99, AMBER7 FF02, Amber4.1 FF95, Kollman
All, Kollman Uni, Gasteiger-Marsili, Gasteiger-
Huckel, Delre, Pullman, MMFF94, or None.

Load Charges
Ligands Specify the type of charges to be loaded onto the ligand

and cofactor atoms in the selected molecule area:
Gasteiger-Marsili, Gasteiger-Huckel, Delre, Pull-
man, MMFF94, or None. You will be presented with
a dialog to select the substructure(s) making up the
ligand and/or cofactors.
Water Whether charges should be loaded for all molecules in
the atom set called {WATER}. The calculation method
is the same as that chosen for the biopolymer atoms.
Metals Whether charges should be loaded onto the metal
atoms. If yes, you will be prompted for the formal
charge value for each metal atom.
Use SLN Typer If off (default): Charges that can be found in the dictio-
on All Atoms nary are used. The remaining charges are assigned by
the SLN typer using the definitions in $TA_DICT/
AMB_PARMS (this may fail if hydrogens are miss-
ing). The corresponding command is BIOPOLYMER
LOAD CHARGES.
If on: The SLN typer loads the charges on all selected
atoms using only the SLN definitions in $TA_DICT/
AMB_PARMS. The corresponding command is
BIOPOLYMER LOAD SLN_AUTO_CHARGES.
6.4.2 Load Charges Via the Command Line

Four commands are available:
• BIOPOLYMER LOAD CHARGES
• BIOPOLYMER LOAD SLN_AUTO_CHARGES
• BIOPOLYMER LOAD DICT_CHARGES
• BIOPOLYMER LOAD DEFINE_ZEROCHARGESET
Procedure
• Any request to compute charges starts by deleting existing charges on
the selected atoms and setting new values to 0.0.
• Charges for the biopolymer component of the molecule can be taken
from the dictionary or computed by the SLN typer or a combination of
both.
• All atoms for which charges can not be assigned are stored in the atom
set {ZERO_CHARGE}. This examination is always done for the entire
molecule.

Load Charges
Dictionary Method Followed by SLN Typer
BIOPOLYMER LOAD CHARGES atom_expr charge_set

charge_set = AMBER7_FF99, AMBER7_FF02, AMBER95_ALL, KOLL_ALL,
KOLL_UNI, KOLL_UNIC, or KOLL_UNIN.
Loads the selected charge set according to the procedure described above by:
• First performing a dictionary look up and reporting the atoms for which
charges could not be found.
• Then invoking the SLN typer to assign charges to the atoms (in the
selection) which still have zero charges. If terminal residues and
blocking groups are among the selected atoms, the SLN method is used
to assign charges to these atoms as well.
SLN Typer Method Only
BIOPOLYMER LOAD SLN_AUTO_CHARGES atom_expr charge_set

charge_set = AMBER7_FF99, AMBER7_FF02, AMBER95_ALL, or KOLL_ALL
Loads the selected charge set according to the procedure described above, but
uses only the definitions in $TA_DICT/AMB_PARMS. The atom type set
KOLL_UNI and the matching charge sets are not available for this option.
The drawback of this method is that the SLN typer can not assign charges to a
substructure that is incomplete (even missing a hydrogen) or bonded to a ligand.
Because the charges of all selected atoms are set to zero before the charge
lookup, if SLN typing fail for a substructure the charges are set to zero for all its
atoms.
Dictionary Method Only
BIOPOLYMER LOAD DICT_CHARGES atom_expr charge_set

Loads the selected charge set according to the procedure described above, but
uses only information in the open dictionary’s residue files. The drawback of
this method is that charges may not be assigned properly to atoms belonging to
terminal residues.

Load Charges
Populate the Atom Set for Undefined Charges
BIOPOLYMER LOAD DEFINE_ZEROCHARGESET mol_area

Looks over the complete molecule and (re)defines the set of atoms with zero
charges {ZERO_CHARGE}. If all atoms have a charge other than 0.0 the set is
deleted. This command is automatically called by all the charge commands
described above and by the Load Charges dialog.
• Charge Derivation for Biopolymers in the Force Field Manual
• How to Change a Single Atom’s Charge in the SYBYL Basics Manual

Edit Termini
6.5 Edit Termini

There are two terminal residues per uniquely named chain in the molecule.
Remarks:
• A block can only be added to the terminal residues in a biopolymer
chain.
• Adding a blocking group causes the deletion of some atoms in the
residue to which it attaches.
• Deleting a blocking group from a molecule reconstructs any missing
pieces of the adjoining residue.
• Blocking groups have a residue number. The blocking group at the
beginning of a chain is numbered 0 (or 1 less than the number of the first
residue in the chain).
• The modified terminal residues and capping/blocking groups are stored
in the substructure set {FIXED_TERMINI}.
• Define a New Blocking Group on page 252
• Fix End Groups on page 112 to modify terminal residues with correct
atom geometry
• See Partial Charges for Blocking Residues in the Biopolymer Dictionary
in the Force Field Manual for a discussion of derivation of charges for
blocking groups

Edit Termini
6.5.1 Edit Terminal Residues via the Menubar

Add caps and blocking groups to all of selected termini in biopolymer chains.
Biopolymer > Prepare Structure > Edit Termini
Available Blocking Groups
For Proteins:
• The N terminus is capped by AMN (Charged; -NH4+), AMI (Neutral;
-NH3) or one of the following blocking groups (see the Force Field
Manual for partial charges on N-terminal groups).
• ACE: N-acetyl
• PYR: N-pyroglutamyl
• FOR: N-formyl
• NMT: N-methyl
• BOC: N-t-butyloxycarbonyl
• The C terminus is capped by CXL (Charged; -COO-), CXC (Neutral;
-COOH) or one of the following blocking groups (see the Force Field
Manual for partial charges on C-terminal groups).
• NME: N-methyl amide
• AMD: amide
• NMM: N,N-dimethyl amide
• CME: methyl

Edit Termini
• MES: methyl ester

• EES: ethyl ester
For DNA and RNA:

• The O5’ terminus is capped by HB.
• The O3’ terminus is capped by HE.
Blocking groups and caps at the beginning of a chain are given the sequence
number 0.
6.5.2 Edit Terminal Residues via the Command Line

Add, remove or modify caps and blocking groups on all or selected termini in
biopolymer chains.
BIOPOLYMER POLY_BLOCK monomer block_name
monomer The terminal residue(s) to which the blocking group

will be added
block_name • CHARGED—Add the charged blocking groups (AMN
and CXL) to the N- and C-termini respectively.
• NEUTRAL—Add the charged blocking groups (AMI
and CXC) to the N- and C-termini respectively.
• NONE—Remove existing blocking groups.
• Any named blocking group.
Refer to the list of available blocking groups above.
Tailor variable BIOPOLYMER BUILD_HYDROGENS determines if or what

hydrogens are added.
BIOPOLYMER POLY_BLOCK combines the capabilities of the BIOPOLYMER

BLOCK and BIOPOLYMER CAP operations and can also:
• block and deblock proline residues
• remove existing blocking group
• change existing blocking group to another type
• preserve the current N-CA-C-O torsion angle

Edit Termini
6.5.3 Add a Blocking Group

Add a blocking group to the end of a biopolymer chain.
BIOPOLYMER BLOCK monomer block_name
monomer The residue to which the blocking group will be added

block_name The blocking group to add. See Available Blocking
Groups on page 109.

• BIOPOLYMER POLY_BLOCK, a more powerful command to edit chain
termini
• Fix End Groups on page 112 to modify terminal residues with correct
atom geometry
• See Partial Charges for Blocking Residues in the Biopolymer Dictionary
in the Force Field Manual for a discussion of derivation of charges for
blocking groups
6.5.4 Cap a Terminal Residue

Add the appropriate residue cap fragments to the specified atoms.
BIOPOLYMER CAP atom_expr

Caps will be added only to legitimate connection atoms that have unfilled
valences. The cap fragments are defined in the open dictionary. Biopolymers
built using one of the dictionaries already have caps. Typically, cap need to be
added when the biopolymer is read in from an outside source.
BIOPOLYMER POLY_BLOCK, a more powerful command to edit chain termini

Fix End Groups
6.6 Fix End Groups

Add charged or neutral blocking groups to the terminal residues of a protein or
peptide.
Menubar: Biopolymer > Prepare Structure > Fix End

Groups
Command: BIOPOLYMER FIX_END_GROUPS mol_area type
Arguments: • mol_area—Molecule area containing the molecule
of interest
• type—Type of end groups:
CHARGED— the blocking group CXL (-COO-) is
added to all C termini and the blocking group AMN
(-NH4+) is added to all N termini.
NEUTRAL—The blocking group CXC (-COOH) is
added to all C termini and the blocking group AMI
(-NH3) is added to all N termini.
There are two terminal residues per uniquely named chain in a molecule. These
terminal residues are stored in the substructure sets {CHAIN_HEAD} and
{CHAIN_TAIL}. Fixing end groups updates the chain termini sets.
Hydrogens are added to the end groups as determined by Tailor variable

BIOPOLYMER BUILD_HYDROGENS.
• Edit Terminal Residues via the Command Line on page 110
• Biopolymer End Group Modeling on page 317 for information about end
group modeling
• Biopolymer Charges in the Force Field Manual for a discussion of
derivation of charges for blocking groups
• The Kollman Force Field in the Force Field Manual for information
about the implementation of the Kollman force field in SYBYL

Fix SYBYL Atom Types in Cofactor
6.7 Fix SYBYL Atom Types in Cofactor

Assign SYBYL atom types to various standard cofactors. This operation also
corrects bond types.
Biopolymer > Prepare Structure > Fix SYBYL Atom Types in Cofactor
BIOPOLYMER TYPE_COFACTOR atom_expr cofactor

[OVERRIDE_DICT]
Molecule Select from this list the molecule to which you want to
apply the atom expression.
Cofactor Select the type of cofactor present in the protein:
cAMP, cGMP, ADP, GDP, ATP, GTP, Porphin, Proto-
porphyrinIX, HEME-FeII, NAP, NADP, FAD. If you
select the entire list (click the icon), SYBYL will
find the largest matching cofactor.
Override Dictio- Use this option to ignore any cofactor already identified
nary Types based on definitions in the dictionary and to assign
SYBYL atom and bond types based on the specified
template(s) instead.
The SLN definitions for the cofactor templates are in $TA_DICT/

cofactors.def.
The HEME-FeII template contains a bound iron atom and will set the bonds
types to the iron to NC (non-chemical). NC bonds are invisible on the display
and are ignored during minimizations. Note that, to match this template, the iron
atom must have four bonds to the porphyrin ring in the PDB file.

Fix SYBYL Atom Types in Cofactor
A side effect of modifying the atom types is that SYBYL may prompt you for
the appropriate type for one or more bonds. If this happens, press OK after
entering the bond type. Once all the atom types are correct, the bond types will
be adjusted automatically.
Assigning atom types to a PDB file is, in most cases, handled completely by
SYBYL’s PDB Reader. If you need to assign atom types manually we
recommend that you proceed in the following sequence:
1. Assign SYBYL atom types to the cofactor (if any)
2. Assign SYBYL atom types to the ligand.
3. Assign AMBER atom types to all selected atoms. This operation relies
on information in the dictionary and SLN-based file about standard
residues and general functional groups. For that reason it is better to
assign correct SYBYL types to ligands and cofactors before proceeding
with the AMBER type assignment.

Fix SYBYL Atom Types in Ligand
6.8 Fix SYBYL Atom Types in Ligand

Assign SYBYL atom types to atoms in the set {UNK_ATOMS} which is
created only if SYBYL’s PDB reader fails to process HETATM records by
means of information stored in the dictionary.
Upon reading a PDB file into SYBYL check the console for the presence of the
following line:
NOTE: Check atom and bond types for atoms in local set UNK_ATOMS.
Biopolymer > Prepare Structure > Fix SYBYL Atom Types in Ligand
• Specify the molecule area containing the molecule of interest.
• SYBYL attempts to set the ligand atom types automatically by using the
SLN definitions for ligand templates (in $TA_DICT/ligand_db.def).
and chemical groups (in $TA_DICT/group_db.def). Press OK to start
this operation.
• If the ligand contains unconnected atoms SYBYL attempts to add the
bonds with the command QUICKBOND and to determine the bond types
with the command MODIFY BOND AUTO_TYPE. Press OK to start this
operation.
• The ligand is displayed in the center of the graphics screen with its
updated SYBYL atom types.
• A dialog offers an opportunity to make manual adjustments if necessary.
• Press Cancel if the automatic typer produces the correct results.
• Press OK to proceed with manual corrections. You can then select
specific atoms and assign them the proper SYBYL atom types. Note
that if you need to change an atom to a different chemical element
you will also need to change the atom’s name to reflect that.
The macromol dictionary includes a ligand database that is based on infor-

mation retrieved from the Ligand Depot site, a service associated with RCSB.
This greatly improves the SYBYL PDB reader's ability to assign correct atom
and bond types to most ligands.The SLN definitions for the ligand templates are
in $TA_DICT/ligand_db.def.
on information in the dictionary and SLN-based file about standard

Fix SYBYL Atom Types in Ligand


Assign AMBER Atom Types
6.9 Assign AMBER Atom Types

Assign AMBER, Kollman or MMFF94 atom types for use in force field
minimizations and other SYBYL functionality.
on information in the dictionary and SLN-based files about standard
6.9.1 Assign AMBER types via the Menubar

Procedure
• Atom types for the biopolymer component of the molecule can be
assigned by a combination of the dictionary and the SLN typer or by the
SLN typer alone.
• All atoms for which the requested type can not be assigned are stored in
an atom set. Set names match the selected force fields:
{UNK_AMBER95_ALL}, {UNK_KOLL_ALL}, {UNK_KOLL_UNI}
or {UNK_MMFF94}. This examination is always done for the entire
molecule.
• Progress is reported in the console, include the name of the atom set in
which atoms with unassigned types (if any) are stored.
Biopolymer > Prepare Structure > Assign AMBER Atom Types

Selection
Molecule Area Select the molecule area containing the molecule of

interest.
Atoms All atoms are selected by default. Use the Atom Expres-
sion dialog to select specific atoms.
Atom Types Select a set of non-SYBYL atom types:
• AMBER7 FF99
• AMBER7 FF02
• AMBER4.1 FF95
• Kollman All
• Kollman United
• MMFF94

Number of Miss- Reports the number of atoms for which the specified
ing Atom Types atom types could not be found in the open dictionary
(typically, belonging to ligands, cofactors, and metals).
Note that the number reported does not include atoms
that may have been typed incorrectly by the dictionary
(typically for terminal residues). The missing atoms are
stored in atom sets named according to the selected
type set: {UNK_AMBER7_FF99},
{UNK_AMBER7_FF02}, {UNK_AMBER95_ALL},
{UNK_KOLL_ALL} {UNK_MMFF94}.
The corresponding command is BIOPOLYMER LOAD
DEFINE_UNKSET.
Assign Atom Load the specified set of atom types to all the selected
Types Atoms and marked as user-defined. If this operation
fails for a few atoms (such as lone pairs or unrecog-
nized functional groups) the Number of Missing
Atom Types indicates unassigned atoms. Use the
experts options below to assign them. Lone pairs on
sulfur atoms are added or removed as required by the
The corresponding commands are BIOPOLYMER LOAD
OTHER_ATOM_TYPES followed by BIOPOLYMER LOAD
DICT_TO_USER.
Expert Options
Expert Options Activate the expert options.

Highlight Miss- Highlight all atoms that are missing the specified atom
ing Atom Types type.
Label Label atoms
• Nothing—No labels
• All Known Atom Types—Label only the atoms
that have the proper type for the selected AMBER/
Kollman type set.
• Missing with SYBYL Types—Label with
SYBYL atom types the atom that do not have the
proper type for the selected AMBER/Kollman type
set.
• Missing with Atom Names—Label with atom
names the atom that do not have the proper type for
the selected AMBER/Kollman type set.

Assign Atom Use any of the following methods.

Types Using • Dictionary—Uses the AMBER/Kollman types
definitions in the dictionary .res files, but does not
modify atom types marked as user-defined
(typically those assigned manually or by the SLN
typer or read in from a . mol2 file). The corre-
sponding command is BIOPOLYMER LOAD
DICT_TYPES.
• SLN Typer—Use the SLN-based rules for typing
residues and general functional groups. The corre-
sponding command is BIOPOLYMER LOAD
SLN_AUTO_TYPES.
The treatment of existing atom types by the SLN
typer is controlled by Tailor variable BIOPOLYMER
SLN_TYPER_MODE (described below).
• Manual—Use MODIFY ATOM OTHER_TYPE to
resolve the remaining unassigned types or assign
types to specific atoms. Types assigned manually
are marked as “user-defined” and protected (by
default) from further atom type assignments.
Fix Termini Use the SLN atom typer on the terminal residues only.
Using SLN This is useful when the structure already has the correct
Typer AMBER atoms types, but the blocking groups were
changed. The treatment of existing atom types by the
SLN typer is controlled by Tailor variable BIOPOLYMER
SLN Typer Mode • Keep User Defined—Prevents changes to
AMBER/Kollman atom types already assigned (via
MODIFY ATOM OTHER_TYPES or through entries in
the .mol2 file).
• Assign Unknown—Process only atoms that do
not have AMBER/Kollman types definitions in the
dictionary .res files and no AMBER/Kollman
definitions marked as user-defined.
• Assign All—Retype all atoms using the SLN atom
typer.
Correspond to Tailor variable BIOPOLYMER

6.9.2 Assign AMBER Types via the Command Line

Procedure
• Atom types for the biopolymer component of the molecule can be taken
from the dictionary or computed by the SLN typer or a combination of
both.
• All atoms for which the requested type can not be assigned are stored in
an atom set. Set names match the selected force fields:
{UNK_AMBER95_ALL}, {UNK_KOLL_ALL}, or
{UNK_KOLL_UNI}. This examination is always done for the entire
molecule.
Six commands are available:

• BIOPOLYMER LOAD OTHER_ATOM_TYPES
• BIOPOLYMER LOAD SLN_AUTO_TYPES
• BIOPOLYMER LOAD DICT_TYPES
• BIOPOLYMER LOAD DICT_TO_USER
• BIOPOLYMER LOAD MINIMAL_USER_SET
• BIOPOLYMER LOAD DEFINE_UNKSET
Note: These commands apply only to biopolymers. To assign AMBER of

Kollman atom types via the command line you must use MODIFY ATOM
OTHER_TYPES to assign atom types manually or via the SLN typer.
Dictionary Method Followed by the SLN Typer
BIOPOLYMER LOAD OTHER_ATOM_TYPES atom_expr type_set

type_set = AMBER7_FF99, AMBER7_FF02, AMBER95_ALL, KOLL_ALL, or
KOLL_UNI.
Loads the specified atom type set according to the procedure described above
by:
• First performing a dictionary look up and attempting to assign types to
atoms not marked as user-defined for the same type set.
• Then invoking the SLN typer to assign atom types to the remaining
atoms (typically terminal residues, blocking groups, and unknown/
unparametrized atoms in ligands and cofactors). The treatment of

existing atom types by the SLN typer is controlled by Tailor variable

BIOPOLYMER SLN_TYPER_MODE (described below).
• Finally, marking as user-defined only the types assigned by the SLN
typer. To mark the dictionary-assigned atom types as user-defined you
must invoke the command BIOPOLYMER LOAD DICT_TO_USER.
SLN Typer Method Only
BIOPOLYMER LOAD SLN_AUTO_TYPES atom_expr type_set

KOLL_UNI
Assigns the specified atom types to the selected atoms according to the
procedure described above, but uses only the definitions in $TA_DICT/
AMB_PARMS. Assigned atom types are marked as user-defined.
The drawbacks of SLN-only method are:

• The SLN typer can not assign atom types to any atom in a substructure
that is incomplete (even missing a hydrogen) or bonded to a ligand.
• The atom type set KOLL_UNI and the matching charge sets are not
available for this option.
Tailor variable
The Tailor variable BIOPOLYMER SLN_TYPER_MODE controls whether existing
atom types for the specified AMBER/Kollman type set are overwritten by the
SLN typer:
• KEEP_USERDEF (default)—Prevents changes to atom types already
marked as user-defined. See Marking Atom Types as User-Defined
below for the command that marks atom types as “user-defined” for
safekeeping.
• ASSIGN_UNKNOWN—Processes only atoms that do not have atom type
definitions in the dictionary .res files and no definitions marked as user-
defined.
• ASSIGN_ALL—Retypes all atoms using the SLN atom typer, overwriting
existing atom types.

Dictionary Method Only
BIOPOLYMER LOAD DICT_TYPES atom_expr type_set

KOLL_UNI
Assigns the specified atom types to the selected atoms according to the
procedure described above, but uses only information in the open dictionary’s
residue files. This operation does not overwrite atom types already marked as
user-defined.
The drawbacks of this method are that:

• Atom types may not be assigned properly to atoms belonging to terminal
residues.
• The atom types taken from the dictionary are not stored in the molecular
description and are, therefore, not written out when the molecule is
saved to a .mol2 file. See Marking Atom Types as User-Defined below
for the command that marks those atom types for safekeeping.
Marking Atom Types as User-Defined
BIOPOLYMER LOAD DICT_TO_USER atom_expr type_set

KOLL_UNI.
Marks all atom types assigned from the dictionary as user-defined types. This
has the following advantages:
• User-defined atom types are not overwritten by the SLN typer when
using the default set by Tailor variable BIOPOLYMER SLN_TYPER_MODE
KEEP_USERDEF (see Tailor variable above).
• User-defined atom types are stored in the molecular description and are,
therefore, written out when the molecule is saved to a .mol2 file.
• User-defined atom types are protected from overwriting by the automatic
and mandatory checking of terminal residues.

Minimal Set of User-Defined Types
BIOPOLYMER LOAD MINIMAL_USER_SET atom_expr type_set

KOLL_UNI.
In many instances the atom types stored in the dictionary are identical to those
produced by the SLN typer. This is the case, for example, for standard amino
acids that are not terminal residues. This command compares, for the selected
atoms and the specified type set, the atom types marked as user-defined and
those in the open dictionary. Those found to be identical, are no longer marked
as user -defined types and the corresponding atoms are removed from the appro-
priate atom set.
This option is useful to identify only those atoms that were typed by any
method other than the dictionary. Such methods include the SLN typer, manual
assignment, and import from an external source via a .mol2 file. After this
operation for the AMBER7-FF99 force field, for example, you can use the
command MODIFY ATOM OTHER_TYPES AMBER7_FF99 LIST USER to list
the minimal set of user-defined atom types for the selected atoms. Additional
user-defined atom types can be removed manually via MODIFY ATOM
OTHER_TYPES AMBER7_FF99 UNASSIGN.
Populate the Atom Set for Undefined Atom Types
BIOPOLYMER LOAD DEFINE_UNKSET mol_area type_set

KOLL_UNI.
Looks over the complete molecule and (re)defines the set of atoms with
unknown atom types for the specified type set. The set names match the
available force fields: {UNK_AMBER7_ALL}, {UNK_AMBER95_ALL},
{UNK_KOLL_ALL}, {UNK_KOLL_UNI}. If all atoms have types for the
specified type set the corresponding set is deleted.
This command is automatically called by all the atom typing commands

described above.

Add Sidechains
6.10 Add Sidechains

Add sidechains to all or part of a biopolymer.
6.10.1 Add Sidechains Via the Menubar

Biopolymer > Prepare Structure > Add Sidechains
Residue Selection
Select Residues Access the Substructure Expression dialog to select the

desired residue(s). The selection is then echoed above
the button.
Number The number of residues affected by the operation.
Selected

Add Sidechains
Sidechain Positioning
Initial Sidechain Select the source of the initial conformation for each
Position sidechain being added:
• SYBYL—The conformation of the matching
residue in the open dictionary.
• Lovell—The rotamer in the Lovell rotamer library
that results in the fewest bumps with the rest of the
molecule. (S.C. Lovell, J.M. Word, J.S. Richardson
and D.C. Richardson in “The Penultimate Rotamer
Library.” Proteins: Structure Function and
Genetics, 40, 389-408 (2000).
http://kinemage.biochem.duke.edu/databases/
rotamer.php)
Tailor variable BIOPOLYMER ROTAMER_DIRECTORY
adds user-defined rotamer libraries to this list. In partic-
ular, see Dunbrack Rotamer Library on page 189.
Scan Sidechains Whether to attempt to remove steric interactions
between the added sidechains and the rest of the mole-
cule. Torsion angles in the new sidechains are scanned,
through a full 360°, for positions that relieve bad steric
interactions. Only one bond at a time is altered. After a
position is found, that bond is removed from consider-
ation. Scanning continues until all interactions depen-
dent upon these bonds are relieved or until no progress
is made from one iteration to the next.
Number of Incre- Number of angle steps used to rotate through 360°. The
ments (N) amount of rotation at each step is 360/N.
The default value is taken from Tailor variable SCAN
NUMBER_INCREMENTS
Scan vdW Fac- Constant scaling factor to apply to all van der Waals
tor radii.
Tailor variable SCAN VDW_SCALE
Action Buttons
Add Sidechains Add the sidechains on the selected residues using the
desired method. The dialog remains open so you can
add more sidechains using a different method if
desired.
Add and Close Add the sidechain on the selected residues and close the
dialog.

hydrogens are added to the new sidechains.

Add Sidechains
• Set Sidechain Conformation on page 185
• File Format for Rotamer Libraries on page 188
6.10.2 Add Sidechains Via the Command Line

BIOPOLYMER ADD_SIDECHAINS sequence
sequence Residue sequence to which sidechains must be added.
The sidechains to be added are determined from the residue types. The confor-
mations are retrieved from the matching residues in the open dictionary. If a
selected residue already has a sidechain nothing it done to the existing
sidechain.

hydrogens are added to the new sidechains.
• Scan Torsions to quickly find reasonable conformations for the newly
added sidechains (in the SYBYL Basics Manual)

Fix Sidechain Amides
6.11 Fix Sidechain Amides

Orient the sidechain amides of ASN and GLN residues in the direction of
maximal potential hydrogen bonding.
The best orientation is determined as follows:

• For each selected GLN and ASN residue, another conformation of the
side chain amide group, rotated by 180°, is generated.
• The built-in set {POSSIBLE_HBONDS} is used to sum up all potential
hydrogen bonds within a specified distance of either the oxygen or
nitrogen atom of the asparagine or glutamine sidechains. The specified
distance for the search is defined by Tailor variable HBONDS
MAX_DISTANCE plus 1.0 Å.
• For each residue under consideration the amide group is left in the
direction of the greatest hydrogen bonding potential and a message is
generated if this orientation is different from the initial one.
6.11.1 Via the Menubar

Biopolymer > Prepare Structure > Fix Sidechain Amides
Mol Area The molecule area containing the molecule of interest.

Residues List of asparagine and glutamine residues in the
selected molecule.
Buttons to assist in the selection of residues whose
sidechains will be evaluated: select all, invert selection,
clear selection.

Fix Sidechain Amides
6.11.2 Via the Command Line

BIOPOLYMER FIX_ASN_GLN subst_expr
• subst_expr = select the residues of interest or enter *. Only the ASN and
GLN residues within your selection will be considered.

Fix Prolines
6.12 Fix Prolines

Fix the geometry of a proline following a Mutate Monomers or Add Sidechains
operation.
Menubar: Biopolymer > Prepare Structure > Fix Prolines

A list of prolines and hydroxyprolines is presented.
Select one, several or All.
Command: BIOPOLYMER FIX_PROLINE proline_residue
Only one proline may be fixed at a time. To fix all pro-
lines, use the following SPL code:
for s in %substs({MONTYPE(PRO)})
biopolymer fix_proline $s
endfor
Proline is the only standard amino acid containing a ring in its backbone, and its
preferred phi angle value is not near -70°. Mutating a residue into a proline
results in a very poor geometry of the proline residue.
Fixing prolines does not reset the phi angle directly, since this would affect the
geometry of the rest of the chain. Careful energy minimization must be applied
after a residue has been mutated into a proline. See the Force Field Manual for a
description of local minimizations.

Chain Termini Sets
6.13 Chain Termini Sets

Chain termini sets are a convenient way to identify residues at the beginning
(head) and end (tail) of chains in a biopolymer.
• There are two terminal residues per uniquely named chain in a molecule.
• Terminal residues are stored in the substructure sets {CHAIN_HEAD}
and {CHAIN_TAIL}.
• The set membership is established automatically when a biopolymer is
built in SYBYL or read in from a PDB file. Set membership is updated
when charged end groups are added to the chain termini (see Fix End
Groups on page 112).
• If two or more chains in a single molecule have the same name (multiple
A chains are common in PDB files), only the first residue of the first
chain with the same name and the last residue of the last chain with the
same name are flagged as terminal residues.
• Re-evaluating the chain termini sets is important after renaming chains.
Menubar: Biopolymer > Prepare Structure > Chain Termini

Sets
Then select the desired molecule.
Command: BIOPOLYMER SET TERMINI mol_area
• If the set composition was already correct:

Termini sets CHAIN_HEAD and CHAIN_TAIL are up to date.
• If the sets were incomplete, such as after renaming one of the chains:
Adding CHAIN_HEAD (x monomers).
Adding CHAIN_TAIL (x monomers).
• If the command is not able to determine the “real” head/tail substructure
(for example, if the chain is broken in the middle and the termini where
altered) you will be asked to specify the head and tail substructures from
a list.
Updating the termini sets is recommended after deleting residues or altering

backbone bonds:
• Delete Monomers on page 172
• Break a Chain on page 156
• Join Chains on page 157
Set Chain Names on page 132

Set Chain Names
6.14 Set Chain Names

Change the name of a chain.
Menubar: Biopolymer > Prepare Structure > Set Chain Names

• Select the desired sequence(s) or press All in the Atom
Expression dialog.
• Enter the new name for all the selected chains.
Command: BIOPOLYMER SET CHAINNAME sequence name
• sequence—Residue sequence(s) to be assigned a chain
name or *
• number—New name for the chain(s)
Biopolymer chains are generally composed of whole connected sequences of

residues and are by default called A, B, etc.
Naming rules for biopolymer chains:

• Valid chain names consist of a string of 1-4 alphanumeric characters.
• If a chain name begins with an alphabetic character, the following
characters may be any alphanumeric character, an underscore or a dollar
sign.
• If a chain name begins with a numeric character all other characters must
also be numeric.
Renaming chains is recommended after deleting residues or altering backbone

bonds:
Chain Termini Sets on page 131

Renumber a Sequence
6.15 Renumber a Sequence

Renumber residues a sequence.
Menubar: Biopolymer > Prepare Structure > Renumber

Sequence
• Select the desired sequence(s) or press All in the Atom
Expression dialog.
• Enter the starting number for each selected sequence.
Command: BIOPOLYMER RENUMBER sequence {number}
• sequence—Residue sequence(s) to renumber or *
• number—Starting number for each specified sequence
Blocking groups and caps at the beginning of a chain are given a sequence
number that precedes the number of the first residue. This means that if the first
residue is given the number 1 in the sequence, the blocking group or cap
connected to it is numbered 0.
How to modify a single substructure name (in the SYBYL Basics Manual)

Convert PDB Atom Names
6.16 Convert PDB Atom Names

Update protein structures in Mol2 files (created before SYBYL 8.0) to the PDB
v.3.1 nomenclature.

Biopolymer > Prepare Structure > Convert PDB Atom Names
Select Molecule The molecule must be in a molecule area.

Convert From Atom names in AMBER, PDB V2.3 or PDB V3.1 for-
mat.
To Atom names in AMBER, PDB V2.3 or PDB V3.1 for-
mat

BIOPOLYMER CONVERT mol_area source_format
target_format
• mol_area—the molecule area containing the molecule of interest
• source_format—AMBER, PDB_V23 (default) or PDB_31
• target_format—AMBER, PDB_V23 or PDB_31 (default)

Check Biopolymer Geometry
6.17 Check Biopolymer Geometry

To check the local geometry of a protein structure and report any deviations
from normal values.
MDE: Protein > Check Local Geometry
BIOPOLYMER CHECK_GEOMETRY sequence {checks} DONE

{output} DONE
sequence Protein sequence to be checked (“*” or a molecule area

indicates the whole protein)
checks One or more of the following, separated by spaces.
Enter DONE to end this loop.
• ANGLES—Check bond angles
• BONDS—Check bond lengths
• CHIRALITY—Check for proper chirality of all CA
atoms and CB atoms in THR and ILE residues
• OMEGA—Check for trans conformation of peptide
bonds
output One or more of the following, separated by spaces.
Enter DONE to end this loop.
• COLOR_MOLECULE color—Color in the indicated
color nonstandard parts of the molecule
• FILE filename—Write all violations to a file
• TERMINAL—Print all violations on the terminal
screen
Bond angles and bond lengths are compared to their equilibrium value in the
KOLL_ALL (AMBER all-atom) force field. Deviations greater than those
specified by Tailor variable BIOPOLYMER CHECK_GEOMETRY are reported.
For peptide bonds, omega angles that deviate more than the specified threshold
from 180° are reported.
For chirality checking, BIOPOLYMER CHECK_GEOMETRY reports on alpha

carbons with D stereochemistry by measuring the zeta virtual torsion angle
defined between atoms CA-N-C-CB. D-amino acids have negative zeta values.
BIOPOLYMER CHECK_GEOMETRY also reports on differences from the normally
observed chirality of the CB atoms of THR (should be R) and ILE (should be S)
residues.
The following atoms are colored when requested:

• ANGLES—All three atoms involved in a non standard angle
• BONDS—Both atoms involved in a non standard bond

Check Biopolymer Geometry
• CHIRALITY—The single chiral atom with wrong chirality

• OMEGA—The two bond atoms involved in a non standard omega torsion
UIMS Variables:
• BIO_CHECK_NBAD_ANGLES = number of non standard angles found
• BIO_CHECK_NBAD_BONDS = number of non standard bonds found
• BIO_CHECK_NBAD_CHIRALS = number of non standard chiralities found
• BIO_CHECK_NBAD_OMEGAS = number of non standard omega angles
found
• Measure Conformation on page 174 to measure omega and zeta angles
• ProTable Check Local Geometry (in the ProTable Manual)

Convert an External Mol2 File to a Biopolymer
6.18 Convert an External Mol2 File to a Biopolymer

To convert to a SYBYL biopolymer the molecule read in from Mol2 file written
by an external application (such as Benchware 3D Explorer).
BIOPOLYMER FIX_MOLECULE mol_area

• mol_area—The molecule area into which the non-SYBYL Mol2 file was
loaded.
This functionality instructs SYBYL to recognize a molecule as a biopolymer

and completes the data records for atoms, bonds, and substructures.

Convert a Small Molecule to a Biopolymer
6.19 Convert a Small Molecule to a Biopolymer

To convert a “small” molecule to a biopolymer.
BIOPOLYMER SMALL_TO_POLY small_mol poly_mol sequence
small_mol Molecule area containing the small molecule to be con-

verted
poly_mol Molecule area in which to place the converted biopoly-
mer
sequence Residue sequence contained in the molecule
This functionality instructs SYBYL to recognize a molecule as a biopolymer.

This operation is necessary if you have constructed a biopolymer using non-
Biopolymer commands (e.g. SYBYL’s Sketcher), and want to operate on it
using Biopolymer commands.
For most cases you should choose an empty molecule area for the resulting
biopolymer. If, however, you choose a non-empty molecule, SYBYL assumes
that it already contains a molecule with the proper sequence, and performs the
conversion without prompting you for the residue sequence. In either case, the
small molecule must be topologically identical to or a super-structure of the
residue sequence or existing biopolymer.
• MODIFY ATOM OTHER_TYPES to display Kollman atom types (SYBYL
Basics Manual)
• See the Force Field Manual for the list of Kollman atom types

Minimize Biopolymer Structures
6.20 Minimize Biopolymer Structures

Numerous force fields or potential energy functions have been developed for
simulation and modeling of biopolymers.
Refer to the Force Field Manual for information on:

• Force Fields for Biopolymer
• Charge Derivation for Biopolymers
• Staged Minimization
• Minimize a Subset of Residues
• Molecular Dynamics

7. Build Biopolymer
• Build Protein, DNA Strand, RNA Strand, Carbohydrate on page 142
• Disulfide Bridges on page 148
• Build C-alpha to Backbone on page 149
• Build a DNA Double Helix on page 151
• Build an RNA Double Helix on page 152
• Add Solvent or Cofactor on page 153
• Form a Cyclic Peptide on page 158
• Add Phosphate Caps on page 159
• Build a Random Sequence on page 160

7. Build Biopolymer
Build Protein, DNA Strand, RNA Strand, Carbohydrate
7.1 Build Protein, DNA Strand, RNA Strand,

Carbohydrate
Build a biopolymer chain from a specified sequence of residues.
Biopolymer > Build > Build Protein (also DNA Strand, DNA Double
Helix, RNA Strand, RNA Double Helix or Carbohydrate.
or File > New > Protein
Note: To build a protein/DNA or a protein/polysaccharide complex open the

macromol dictionary, build the pieces in separate work areas then join (JOIN)
or merge them (Edit > Merge).
• Protein Modeling on page 309
• Nucleic Acid Modeling on page 328
• Polysaccharide Modeling on page 330
7.1.1 Build a Biopolymer via the Menubar

Six similar dialogs reflect the menu selection for the type of biopolymer to
build. The Build Protein dialog is used here as a representative.
Residues
Read Sequence Access a file browser to retrieve a protein sequence in

PIR or FASTA format.

7. Build Biopolymer
Residues Define the sequence by clicking on the residue buttons.

See the list of Non-Standard Amino Acids below.
Sequence Each residue selection is echoed in this section. Right-
Builder click to access a context menu.
Reset Press this button to clear the Sequence Builder area

and reset the dialog to its default options.
Addons
Add Hydrogens Whether to add all hydrogens to the sequence being

built.
Note: This option is not accessible when building a
DNA or RNA double helix.
Adjust Geome- Check this box to move the part of the chain affected
try by an insertion, mutation, or replacement operation.
Leaving this option unchecked maintains the existing
structure of the peptide, but may result in bad local
geometry.
Tailor variable BIOPOLYMER ADJUST_GEOMETRY
Conformation Select the desired backbone conformation for the resi-
due sequence. The choice of available conformations
depends on the type of biopolymer you are building and
on the current dictionary.
DNA or RNA Double Helix.

7. Build Biopolymer
N-terminus Select the capping mechanism for the head of the

sequence to be built. The choices depend on the type of
biopolymer you are building and on the current dictio-
nary. All N-terminal residues are stored in the
{CHAIN_HEAD} substructure set.
• For proteins, the N terminus is capped by AMN
(Charged), AMI (Neutral) or one of the following
blocking groups: ACE, PYR, FOR, NMT, MES or
BOC.
• For DNA and RNA, the O5’ terminus is capped by
HB.
C-terminus Select the capping mechanism for the tail of the
sequence to be built. The choices depend on the type of
biopolymer you are building and on the opened dictio-
nary. All C-terminal residues are stored in the
{CHAIN_TAIL} substructure set.
• For proteins, the C terminus is capped by CXL
(Charged), CXC (Neutral) or one of the following
blocking groups: NME, AMD, NMM, CME, MES,
or EES.
• For DNA and RNA, the O3’ terminus is capped by
HE.
Charge Model Specify the type of charges to be loaded onto the
sequence being built: AMBER7 FF99, AMBER7
FF02, Amber4.1 FF95, Kollman All, Kollman
United, Gasteiger-Marsili, Gasteiger-Huckel,
Delre, Pullman, MMFF94, or None. Kollman
charges are available only for standard residues.
Angles Access the Angles dialog (see description below). Use it
to set explicit conformational angles.
Remarks:
By default, the biopolymer building operation adds charged end groups to the
chain termini. These groups are taken for the open dictionary.

7. Build Biopolymer
Local substructure sets are automatically defined for terminal residues.

Terminal residues are defined as the first or last residue of a uniquely named,
linear chain, excluding any attached blocking groups.
• CHAIN_HEAD—in a protein: all legitimate N-terminal residues
• CHAIN_TAIL—in a protein: all legitimate C-terminal residues
Non-Standard Amino Acids
ABU: amino butyric acid HSE: homoserine

AIB: amino isobutyric acid HYP: hydroxyproline
ARZ: neutral arginine LYZ: lysine (neutral)
ASZ: aspartic acid (neutral) NLE: norleucine
BAL: beta-alanine NVA: norvaline
CYM: cysteine (-1) ORN: ornithine (+1)
CYX: (half)-cystine ORZ: ornithine (neutral)
GLZ: glutamic acid (neutral) PHG: phenylglycine
HCX: (half)-homocystine PSE: phosphorylated serine (-2)
HCY: homocysteine PSM: phosphorylated serine (-1)
HID: histidine (delta) PSZ: phosphorylated serine (neutral)
HIE: histidine (epsilon) PTM: phosphorylated tyrosine (-1)
HIP: histidine (protonated) PTY: phosphorylated tyrosine (-2)
HPR: hydroxyproline PTZ: phosphorylated tyrosine (neutral)
• List Dictionary on page 237 to list the residues and conformational states
available in the current dictionary
• Biopolymer Charge in the Force Field Manual for a discussion of charge
derivation

7. Build Biopolymer
7.1.2 Conformational Angles

Press the Angles button in the Build Biopolymer dialog.
Generic Angle Select a conformational angle from the list of those

Names defined in the current dictionary.
Angle Value Enter the angle value (in degrees) for the selected con-
formational angle.
[+] Press this button to add a conformational angle value to
the list of conformational angles to be used to build the
specified sequence of residues. Only one instance of a
named conformational angle can be entered in the list.
Apply Accept the list of defined angles and use it when press-
ing Build in the Build Biopolymer dialog.
7.1.3 Build a Biopolymer via the Command Line

Build a biopolymer chain from a specified sequence of residues.
BIOPOLYMER BUILD attachment_point sequence

conformation
attachment_point An atom expression specifying where on the current

molecule to attach the new sequence. To start a new
chain, specify an attachment point of 0 (zero).
sequence Sequence of residues to be added. To obtain a list of
available residues type a question mark (?).
conformation The conformation to impose upon the added sequence.
To obtain a list of available conformational states a
question mark (?).
For the selection of the attachment point, each of the atoms in the expression is
examined, and, if more than one is a legitimate attachment point, you are asked
to select one. If the chosen atom(s) is not a valid attachment point, SYBYL will
look for any atom in the same residue that is valid.

7. Build Biopolymer
If the specified sequence includes residues with more than two connection
atoms (for biopolymers they can be non-linear, such as polysaccharides), Tailor
variable BIOPOLYMER ASSIGN_ATTACH_MODE controls how often you will be
prompted to resolve the ambiguities in the connection.
Tailor Variables:
• Tailor variable BIOPOLYMER BUILD_HYDROGENS determines if and
what hydrogens are added to the new residues.
• Tailor variable BIOPOLYMER ASSIGN_ATTACH_MODE
• Biopolymer Dictionary on page 236
• A description of the syntax for residue sequences and conformation
specification in the SPL Manual
• Biopolymer Charge in the Force Field Manual for a discussion of charge
derivation

7. Build Biopolymer
Disulfide Bridges
7.2 Disulfide Bridges

Form a disulfide bridge between the specified cysteine residues.

Biopolymer > Build > Create Disulfide
The dialog will be displayed only if the selected molecule contains at least two
cysteine residues.
Separation Distance between the residues currently selected.

Residues Menus listing the possible pairs of sulfur-containing
residues (CYS, CYX)
Add Adds the pair of residues from the menus above to the
list.
Delete Deletes the selected pair from the list below.
Disulfide List of all disulfide bonds to be formed when you click
Bridges OK.
The positions of the sulfur lone pairs are adjusted after the addition of disulfide
bonds.

BIOPOLYMER DISULFIDE res1 res2
res1 Any CYS, CYX, HCX or HCY residue

res2 Any other CYS, CYX, HCX or HCY residue

7. Build Biopolymer
Build C-alpha to Backbone
7.3 Build C-alpha to Backbone

Construct a full protein backbone from Cα coordinates.
Biopolymer > Build > Build C-alpha to Backbone
BIOPOLYMER CONSTRUCT_BACKBONE source_mol_area

target_mol_area
source_mol_area Molecule area containing the alpha carbons

target_mol_area Molecule area that will hold the result
This command takes as input a SYBYL protein molecule that has, at a

minimum, Cα atoms for each residue in one or more connected chains. Atoms
other than those named “ca” are ignored in constructing the new molecule. This
procedure uses a “spare parts” approach, using fragments retrieved from the
protein database (PRODAT) to construct the full poly-alanine backbone. You
can later add sidechains to the new molecule.
The value set by Tailor variable SET CONSTRUCT_BACKBONE CONNECT_CA

(YES, by default) determines whether the command BIOPOLYMER CONNECT_CA
is called before the backbone is built.
Method:
This command implements a generalized version of the procedure described in
the paper “Modelling the polypeptide backbone with ‘spare parts’ from known
protein structures” by Claessens et. al. [Ref. 35].
This method performs a 3-pass screen for finding each fragment.

1. First it measures the end-to-end distance of all fragments in the database,
saving those fragments whose distance is within a specified tolerance from
the reference fragment.
2. Next, the retained fragments are screened by comparing all inter-Cα
distances within the fragment with the corresponding distances in the
reference fragment, and saving the N best fragments.
3. Finally, it performs a least-squares fit of each retained fragment onto the
reference fragment, and chooses the one with the lowest RMS. If the RMS is
below a threshold, and the fragment length is less than a specified
maximum, the fragment length is incremented by 1 and the procedure is
repeated.
Tailor variable CONSTRUCT_BACKBONE controls the parameters used during the

whole process. The construction of an entire chain starts at the beginning of the
chain for a fragment of a given minimum length (by default, 4 residues). The
length N of the actual fragment found may be anywhere between this minimum

7. Build Biopolymer
Build C-alpha to Backbone
length and a specified maximum length (by default, 10 residues). The process is
then repeated after moving N residues down the chain and looking for the next
fragment from the database. However, to avoid discontinuities at the junction of
two fragments successive fragments are allowed to overlap. The number of
residues that overlap determines how many residues to trim off of the end of
one fragment (by default, 2 residues) and at the beginning of the next fragment
(by default, 1 residue).
Tailor Variables:
• Tailor variable PDB CONNECT_SEQ
• Tailor subject CONSTRUCT_BACKBONE
• Tailor subject PROTEIN_SEARCH
• Binary Protein Database: PRODAT on page 255
• Add Sidechains on page 125
7.3.1 Connect C-alpha Atoms

Connect unconnected Cα atoms sequentially.
Command BIOPOLYMER CONNECT_CA mol_area

mol_area Molecule area containing the unconnected alpha car-
bons
This command is used automatically when constructing a protein backbone

from the alpha carbons. If you have modified the bonding scheme produced by
BIOPOLYMER CONNECT_CA and want to protect it, use Tailor variable
CONSTRUCT_BACKBONE CONNECT_CA NO before constructing the backbone.

7. Build Biopolymer
Build a DNA Double Helix
7.4 Build a DNA Double Helix

Produce a double helix DNA molecule of the specified canonical form from a
given nucleotide sequence.
Biopolymer > Build > Build DNA Double Helix
BIOPOLYMER DNAHELIX form sequence mol_area
form A canonical DNA form: A, A89, B, C, or Z

sequence Any generic DNA sequence
mol_area Molecule area to receive the results

hydrogens are added to the nucleotides.
The terminal residues in both strands are automatically stored in the

{CHAIN_HEAD} and {CHAIN_TAIL} sets. See Chain Termini Sets on page
131.
Nucleic Acid Modeling on page 328

7. Build Biopolymer
Build an RNA Double Helix
7.5 Build an RNA Double Helix

Produce a double helix RNA molecule of the specified canonical form (A or
AP) from a given nucleotide sequence.
Biopolymer > Build > Build RNA Double Helix
BIOPOLYMER RNAHELIX form sequence mol_area
form A canonical RNA form (A or AP)

sequence Any generic RNA sequence
mol_area Molecule area to receive the results

hydrogens are added to the nucleotides.
The terminal residues in both strands are automatically stored in the

{CHAIN_HEAD} and {CHAIN_TAIL} sets. See Chain Termini Sets on page
131.
Nucleic Acid Modeling on page 328

7. Build Biopolymer
Add Solvent or Cofactor
7.6 Add Solvent or Cofactor

Retrieve a solvent or a cofactor from the dictionary and place it in a molecule
area with optional hydrogens and charges.

Biopolymer > Build > Add Solvent/Cofactor
Molecule Area The molecule area to receive the selected solvent or

cofactor. The selection is done before this dialog is
posted.
Solvent/Cofactor Select a solvent or cofactor from the list.
Add Hydrogens Whether to add all hydrogens to the selected solvent or
cofactor.
Load Charges Specify the type of charges to be loaded onto the
selected solvent or cofactor: AMBER7 FF99,
AMBER7 FF02, Amber4.1 FF95, Kollman All,
Kollman United, Gasteiger-Marsili, Gasteiger-
Huckel, Delre, Pullman, MMFF94, or None. Koll-
man charges are available only for standard residues.

Retrieving a solvent or cofactor can be achieved by using a special form of the
BIOPOLYMER BUILD command.
BIOPOLYMER BUILD mol_area code NONE

7. Build Biopolymer
mol_area The empty molecule area in which the solvent or cofac-

tor will be placed.
code The 3-letter code for the desired solvent or cofactor. To
obtain a list type a question mark (?).
NONE The conformation argument used when building a
biopolymer sequence is irrelevant in this context. Sim-
ply enter NONE.
The value specified by Tailor variable BIOPOLYMER BUILD_HYDROGENS deter-

mines whether hydrogens are added to the selected solvent or cofactor.
7.6.3 Cofactors in the macromol Dictionary

The following residues are stored in .res files in the macromol dictionary.
• ADP - adenosine-5'-diphosphate.
• ATP - adenosine-5'-triphosphate
• CMP - adenosine-3',5'-cyclic-monophosphate
• FAD - flavin-adenine dinucleotide
• FDA - dihydroflavine-adenine dinucleotide
• FMN - flavin mononucleotide
• GDP - guanosine-5'-diphosphate
• GTP - guanosine-5'-triphosphate
• PCG - cyclic guanosine monophosphate
• NAI - 1,4-dihydronicotinamide adenine dinucleotide - NADH
• NAD - nicotinamide-adenine-dinucleotide
• NAP - nicotinamide-adenine-dinucleotide phosphate
• NDP - dihydro-nicotinamide-adenine-dinucleotide-phosphate
• HEM - protoporphyrin IX containing Fe
• PIX - protoporphyrin IX
• PFN - porphin
• COA - coenzyme A
• BTN - biotin
• PLP - pyridoxal-5'-phosphate
• TPP - thiamine diphosphate

7. Build Biopolymer
7.6.4 Atom Types and Charges for Defined Cofactors

All cofactors have AMBER7 FF02, AMBER7 FF99, AMBER95 and
KOLLMAN_ALL atom types as well as atom names that are consistent with
those seen in the RCSB.
Atomic charge sets are supplied only for ATP, ADP, GDP, and GTP and only
for the AMBER7 FF99 and AMBER95 force fields.
These charge values were obtained from: Meagher K.L., Redman L.T., Carlson
H.A.,“Development of polyphosphate parameters for use with the AMBER
force field.” J. Comput. Chem., 24:1016-1025 (2003)

7. Build Biopolymer
Break a Chain
7.7 Break a Chain

Delete the backbone bond between two consecutive residues and append cap
fragments to the two atoms which were previously bonded.
Biopolymer > Build > Break Chain
BIOPOLYMER BREAK subst1 subst2
subst1 Residue at one end of the bond to be broken

subst2 Residue at the other end of the bond to be broken
The cap fragments are defined in the biopolymer dictionary.

Disconnecting residues in the middle of a sequence results in multiple chains.

The new chains are automatically renamed, but it is recommended that you
update the termini set membership: see Chain Termini Sets on page 131.
Because the disconnected residues are likely to be too close, you may want to
repair and optimize the local geometry through minimization: see Minimizing a
Subset of Residues (in the Force Field Manual)
Join Chains on page 157 to connect two residue chains

7. Build Biopolymer
Join Chains
7.8 Join Chains

To connect two residue chains.
Biopolymer > Build > Join Chains
BIOPOLYMER JOIN atom1 atom2
atom1 Atom at one end of the inter residue bond to be formed

atom2 Atom at the other end of the bond
Atom1 and atom2 must be valid connection points for the biopolymer, and their
cap fragments will be automatically discarded if necessary.
If the two atoms are in different molecule areas, all atoms attached to atom2
will be merged into atom1’s molecule area before the specified atoms are
connected.
Tailor variable BIOPOLYMER ADJUST_GEOMETRY determines whether to move

the chain attached to the second atom to retain proper bond geometry, if
possible.
After connecting the chains it is recommended to:

• Update the termini set membership: see Chain Termini Sets on page 131
• Relieve bad contacts between the new residues and the rest of the
molecule: see Scan Sidechain Torsions on page 191
• Repair and optimize the local geometry through minimization: see
Minimizing a Subset of Residues (in the Force Field Manual)
• BIOPOLYMER LOOP or BIOPOLYMER TWEAK to model insertions and
deletions in proteins
• Break a Chain on page 156 to break an inter residue bond and append
cap fragments

7. Build Biopolymer
Form a Cyclic Peptide
7.9 Form a Cyclic Peptide

Join the two ends of a peptide chain to form a cyclic peptide.
Biopolymer > Build > Create Cycle
BIOPOLYMER CYCLE residue
residue Select any residue in the sequence within the chain of

interest to distinguish it from other chains
The N of the first amino acid and the C of the last residue in the chain are
identified and connected to each other (see Join Chains on page 157).
No attempt is made to find a valid conformer, nor acceptable bond length or

angles at the ring closure. We recommend that you optimize the geometry of the
cyclic peptide before proceeding with your work.
• MAXIMIN2 to optimize the geometry of the model (in the Force Field
Manual)
• BIOPOLYMER LOOP or BIOPOLYMER TWEAK to generate possible
conformations
• Disulfide Bridges on page 148

7. Build Biopolymer
Add Phosphate Caps
7.10 Add Phosphate Caps

Add a phosphate group to any carbonyl or hydroxyl oxygen. This option is valid
only for work with RNA and DNA. Atom types and charges match the AMBER
force field (4.1 FF95 and 7 FF99 are identical in this respect).

Biopolymer > Build > Phosphorylate
An atom selection dialog prompts you for a hydroxyl or carbonyl oxygen, that
is, an oxygen with only a single bonded neighbor.
Protonation Select from the desired protonation sate: O-PO2-OH

State (PO4H-) or O-PO2-O (PO4--).
Apply AMBER Whether to use the AMBER atom types and add
types and charges to the phosphate group. The charges are valid
charges only for work with DNA and RNA.
Phosphorylate Start the addition of the phosphate based on the
selected options.

BIOPOLYMER PHOSPHORYLATE mol_area atom_sel h? charges?
mol_area Molecule area containing the sequence of interest

atom_sel ID number of a hydroxyl or carbonyl oxygen, that is, an
oxygen with only a single bonded neighbor.
hydrogen Whether to add the hydrogen to the new phosphate
group. The default is YES.
charges Whether to assign AMBER atom types and load
AMBER charges to the new group. The default is YES.

7. Build Biopolymer
Build a Random Sequence
7.11 Build a Random Sequence

To add a biopolymer of random sequence whose composition is in proportion to
the residue sequence that you supply.
BIOPOLYMER RANDBUILD attach_atom length composition

conformation
attachment_atom Attachment atom for new residues

length Length of sequence to be added (# of residues)
composition Sequence of residues indicating the composition of the
random sequence. Residues will be added in the propor-
tions that they appear in this list.
conformation The backbone conformation to impose upon the added
sequence. Note that sidechain conformations are taken
from the dictionary.
If the chosen atom(s) is not a valid attachment point, SYBYL will look for any
atom in the same residue that is valid.

• Build Protein, DNA Strand, RNA Strand, Carbohydrate on page 142 to
build a specific (non-random) sequence
• List the complete sequence of a biopolymer

8. Biopolymer Composition
• Protein Composition Tool on page 162
• Replace Sequence on page 167
• Mutate Monomers on page 168
• Insert Monomers on page 169
• Excise Monomers on page 171

Protein Composition Tool
8.1 Protein Composition Tool

A tool to perform site mutations, insertions, and deletions in proteins within a
single dialog.
Note: Biopolymer operations on monomers affect only dictionary-type residues,

not substructures such as ligands and metals.
Biopolymer > Composition > Protein Composition Tool

Action • Mutate—Mutate one connected sequence into

another of equal length. To mutate a residue
involved in a disulfide bridge you must first delete
the S-S bond.
To mutate a terminal residue attached to a blocking
group you must first remove the blocking group.
• Insert After—Insert one or more residues within
the existing sequence. To add residues at the
beginning or the end of a chain, you must use the
Build Protein dialog.
• Remove and Cap—Delete one or more residues
and cap the residues adjacent to the gap. This
breaks a chain. You should give all chains a unique
name (see Set Chain Names on page 132) then
update the termini set membership (see Chain
Termini Sets on page 131).
• Remove and Join—Delete one or more residues
and add a bond between the residues adjacent to the
gap.
Sequence Selection
Current Access the Substructure Expression dialog to select res-

Sequence idues in the current sequence.
Note for mutations:
• The residues must be connected sequentially.
• To mutate a residue involved in a disulfide bridge
you must first delete the S-S bond.
New Sequence Access the Protein Sequence dialog where you can
select the residues in the new sequence.
For insertions, Adjust Geometry displaces the end of
the chain.
For mutations, the old and the new sequence must be of
identical lengths.
Changed or added residues are stored in a unique and
appropriately named substructure set (e.g.
{INSERTED}).

Sidechain Conformation Details
Initial Sidechain Select the source of the initial conformation for each
Position sidechain being added or mutated:
• SYBYL—The conformation of the matching
residue in the open dictionary.
• Lovell—The most probable rotamer in the Lovell
rotamer library or the one that results in the fewest
bumps with the rest of the molecule. (S.C. Lovell,
J.M. Word, J.S. Richardson and D.C. Richardson in
“The Penultimate Rotamer Library.” Proteins:
Structure Function and Genetics, 40, 389-408
(2000).
rotamer.php)
Scan Whether to attempt to remove steric interactions
between the added or mutated residues and the rest of
the molecule. Torsion angles in the new sidechains are
scanned, through a full 360°, for positions that relieve
bad steric interactions. Only one bond at a time is
altered. After a position is found, that bond is removed
from consideration. Scanning continues until all inter-
actions dependent upon these bonds are relieved or
until no progress is made from one iteration to the next.
NUMBER_INCREMENTS
vdW Factor Constant scaling factor to apply to all van der Waals
radii.
VDW_SCALE.
After Modifying the Sequence
Add Hydrogens If or what hydrogens are added to the new or replaced

atoms: All, Essential or None.
The initial default is determined by Tailor variable
BIOPOLYMER BUILD_HYDROGENS.

Minimize Edited Perform a force field optimization of the new sequence.

Sequence Press Setup to access the Minimize Setup dialog and
specify exactly which atoms will be affected. This is
necessary for mutations involving proline residues.
Fix Bad Local Available only for Insert After and Remove and
Geometry Join. This is the best way to obtain a reasonable geom-
etry for an inserted proline.
Renumber Chain Renumber an extended or shortened chain to use con-
secutive substructure numbers. Available only for
Insert After and Remove and Join.
Action Buttons
Apply and Cre- Apply the specified changes and store the resulting
ate New Model molecule in a new area. By default the new model is
stored in the next available molecule area. You may
also use a browser to designate an alternative destina-
tion.
Apply to Apply the modification to the current molecule.
Selected
Sequence
File Format for Rotamer Libraries on page 188
Minimization Setup
In the Edit Protein Composition dialog, activate Minimize Edited Sequence
and press Setup.

Minimize Specify the extent of the minimization:

• Sidechains and backbone of edited sequence
• Sidechains and backbone within radius of edited
sequence
• Sidechain within radius of edited sequence
• Sidechain of edited sequence
• Whole sidechains
• Whole molecule
Movable Atom The distance from any atom in the edited sequence (by
Radius default, 6 Å) within which unchanged atoms may also
be minimized.
Minimize Details Access the Minimize dialog (see the Force Field Manual
for details).

Replace Sequence
8.2 Replace Sequence

Replace sequential residues with a new sequence of identical length.
This operation replaces entire residues, not just sidechains, thus the geometry of
the backbone could be altered drastically, particularly if a proline is involved.
The backbone conformation is taken from the dictionary. However, any
sidechain conformational angles in the residues will be retained, as far down the
sidechain as possible.
Menubar: Biopolymer > Composition > Replace Sequence

• In the Sequence Expression dialog select the
residues to be mutated. The selected residues must
be connected sequentially.
• The Adjust Geometry check box determines
whether to move the end of the chain to retain
proper bond geometry, if possible.
Command: BIOPOLYMER REPLACE old_seq new_seq
• old_seq—A connected sequence of residues to be
replaced
• new_seq—The new residue sequence (same length
as the old sequence.

hydrogens are added to the new residues.
To replace a residue involved in a disulfide bridge with another type of residue

you must first delete the S-S bond. To replace a terminal residue attached to a
blocking group you must first remove the blocking group.
After replacing a residue sequence you may want to:

molecule: see Scan Sidechain Torsions on page 191.
Minimizing a Subset of Residues (in the Force Field Manual).
Mutating monomers is similar, but replaces only the sidechains. It is much more
efficient and is generally preferable as it maintains the backbone geometry of
the original sequence.
A description of the syntax for residue sequences and conformation specifi-
cation in the SPL Manual

Mutate Monomers
8.3 Mutate Monomers

Mutate sequential residues into a new sequence of identical length.
The mutation replaces only sidechains, leaving the backbone unaltered. It

retains the old values of sidechain conformational angles in the new sidechains,
as much as possible.
Menubar: Biopolymer > Composition > Mutate Monomers

• In the Sequence Expression dialog select the
residues to be mutated. The selected residues must
be connected sequential.
Command: BIOPOLYMER CHANGE old_seq new_seq
• old_seq—A connected sequence of residues to be
mutated
• new_seq—The new residue sequence (same length
as the old sequence)

To mutate a residue involved in a disulfide bridge you must first delete the S-S
bond. To mutate a terminal residue attached to a blocking group you must first
remove the blocking group.
After mutating residues you may want to:

• Fix the ring geometry of PRO residues (if there are any) in the new
sequence: see Fix Prolines on page 130.
molecule: see Scan Sidechain Torsions on page 191.
• Replace Sequence on page 167 to replace entire residues (including
backbone)

Insert Monomers
8.4 Insert Monomers

Insert residues in a linear biopolymer chain. To add residues to the end of a
chain use the Build functionality instead.
Via the Menubar

Biopolymer > Composition > Insert Monomer
Then:
• Select the residue after which the new residue(s) will be inserted.
• In the Insert Biopolymer Sequence dialog (very similar to the Build
Biopolymer dialog) select the desired residue(s).
• The Adjust Geometry check box determines whether to move the end
of the chain to retain proper bond geometry, if possible.
• From the Conformation pull-down select a conformation state for the
backbone of the inserted residue(s). Sidechain conformations are taken
from the open dictionary.
• Press Angles to access a dialog where you can specify the value of
specific conformational angles.
• Tailor variable BIOPOLYMER BUILD_HYDROGENS determines if or what
Via the Command Line

BIOPOLYMER INSERT substr1 substr2 sequence
conformation
substr1 Residue after which the new sequence will be inserted

substr2 Residue before which the new sequence will be
inserted.
sequence Sequence of residues to insert
conformation Backbone conformation to impose upon the added
sequence. The format is:
• statename
• angle1=xxx,angle2=yyy
examples:
• alpha_helix
• alph
• phi=-58.0,psi=47
• beta_sheet,chi1=120

Insert Monomers
If Tailor variable BIOPOLYMER ADJUST_GEOMETRY is set to MOVE_CHAIN, the

chain that follows the inserted residues moves to achieve proper bond geometry,
if possible.

After inserting residues you may want to:

molecule: see Scan Sidechain Torsions on page 191
• BIOPOLYMER LOOP or BIOPOLYMER TWEAK to model insertions in
proteins.
A description of the syntax for residue sequences and conformation specifi-
cation in the SPL Manual

Excise Monomers
8.5 Excise Monomers

Delete residues from a biopolymer and join the residues adjacent to the gap(s).

Menubar: Biopolymer > Composition > Excise Monomer

In the Sequence Expression dialog select the residues to
be excised. They do not need to be connected sequen-
tially.
Command: BIOPOLYMER EXCISE sequence
Existing residues works only on a linear biopolymer chain. If the residues to be

excised are embedded in a cycle, no action is taken and a warning message is
issued.
If Tailor variable BIOPOLYMER ADJUST_GEOMETRY is set to MOVE_CHAIN, the

chain that follows the excised residues moves to achieve proper bond geometry.
After excising residues you may want to:

• BIOPOLYMER LOOP or BIOPOLYMER TWEAK to model deletions in
proteins.
• Delete Monomers on page 172 to delete residues without closing the
gaps

Delete Monomers
8.6 Delete Monomers

Delete residues from a biopolymer without moving the adjacent residues.

Menubar: Biopolymer > Composition > Delete Monomers

In the Sequence Expression dialog select the residues to
be deleted. They do not need to be connected sequen-
tially.
Command: BIOPOLYMER REMOVE sequence
Cap fragments, as defined in the open dictionary, are added to the residues
adjacent to the gap(s). Tailor variable BIOPOLYMER BUILD_HYDROGENS deter-
mines if or what hydrogens are added to the cap atoms.
A residue that is adjacent to a blocking group cannot be deleted unless the

blocking group is also deleted.
Deleting a blocking group from a molecule reconstructs any missing pieces of

the adjoining residue.
Deleting residues in the middle of a sequence results in multiple chains with the
same name. It is then recommended to
• Give all chains a unique name: see Set Chain Names on page 132.
• Update the termini set membership: see Chain Termini Sets on page 131.
• Excise Monomers on page 171 to delete residues and close the gap in the
backbone

9. Biopolymer Conformation
• Measure Conformation on page 174
• Set Backbone Conformation on page 175
• Find Secondary Structure Conformation on page 177
• Assign Secondary Structure on page 181
• Predict Secondary Structure on page 183
• Set Sidechain Conformation on page 185
• File Format for Rotamer Libraries on page 188
• Scan Sidechain Torsions on page 191
• Copy Conformation on page 192

Measure Conformation
9.1 Measure Conformation

Measure conformational angles in all or part of a biopolymer.
Biopolymer > Conformation > Measure Conformation
BIOPOLYMER MEASURE conf_angles sequence
conf_angles List of conformational angle names, separated by com-

mas or “*” to select all defined angles
sequence Residue sequence to measure
Remarks:
The omega torsion values reported by this functionality and by ProTable are
misaligned by one peptide.
• Biopolymer does not have an omega value associated with the first
peptide in a chain, but there is an assignment for the last peptide.
Omega is defined by 4-tuplet CA(i-1)-C(i-1)-N(i)-CA(i) in residue i.
Thus measurements can start only with the second residue in the chain.
• ProTable has an omega assignment for the first peptide in a chain, but
not for the last peptide:
Omega is defined by 4-tuplet CA(i)-C(i)-N(i+1)-CA(i+1) in residue i.
UIMS2 Variable:
BIO_MEASURE_VALUE = the value of the last angle measured by BIOPOLYMER
MEASURE.
• In the Biopolymer Manual:
• Find Secondary Structure Conformation on page 177 to identify
sequences of designated conformational state
• Ramachandran Graphs on page 91 to graph conformational angle
values
• Check Biopolymer Geometry on page 135 to report deviations from
standard geometry
• In the SYBYL Basics Manual:
• Built-in set FINDCONF

Set Backbone Conformation
9.2 Set Backbone Conformation

Set the backbone conformation for multiple residues in linear segment. All the
selected residues must be of the same biopolymer type. Blocking groups are
ignored.
9.2.1 Set Backbone Conformation Via the Menubar

Biopolymer > Conformation > Set Backbone Conformation
In the Substructure Expression dialog, select the residues whose conformation
will be modified. All blocking groups will be ignored.
Conformational Activate the check box then select from the list of con-
State formational states in the current dictionary.
Angle Name Enter the angle value (in degrees) for any of the back-
bone conformational angles. These fields are active
only if the Conformational State check box is off.
Set Apply the defined angles to the selected residues.
9.2.2 Set Biopolymer Conformation via the Command Line

To set the conformation of any part of a biopolymer.
BIOPOLYMER SET CONFORMATION sequence conformation
sequence Sequence specifying residues whose conformation will

be modified
conformation Either conformational state or conformational angles
and values as defined in the current dictionary. Enter
the information as a comma separated list (type a ? in
the console for a list of options and formats). The rota-
mer libraries are not accessible at the command line.
Some torsion angles may need to be modified with the SCAN command.

Set Backbone Conformation
• Set Backbone Conformation Via the Menubar on page 175
• Set Sidechain Conformation Via the Menubar on page 185

Find Secondary Structure Conformation
9.3 Find Secondary Structure Conformation

9.3.1 Find Secondary Structure Via the Menubar
Analyze the conformation of a biopolymer and find all sequences whose confor-
mation matches defined states. Optionally, store this information in substructure
sets and render the secondary structure elements with ribbons and tubes.
Biopolymer > Conformation > Find Secondary Structure
Molecule Select the molecule of interest. All residues will be

evaluated.
Method How to analyze the molecule:
• Kabsch-Sander—Use the Kabsch-Sander
method, which relies on hydrogen-bonding patterns
[Ref. 42].
• Dictionary—Match the conformational states in
the open dictionary. In order to match, a conforma-
tional state must exist over a minimum number of
residues, as defined in the dictionary.
States to Find Select among the conformational states defined in the
open dictionary.
Buttons to assist in the selection of conformational
states: select all, invert selection, clear selection.

Render Confor- Whether to render the secondary structure elements

mations identified when the Find button is pressed. Alpha heli-
cal regions are rendered as shaded ribbons or cylinders,
beta strands as shaded curved arrows, and the remain-
der of the protein as a shaded curved tube. See Tailor
subject RENDER to specify the color scheme and other
characteristics.
• No rendering is done if the check box is off.
• Secondary Structure Only—Render only the
secondary structure elements (helices, sheets, and
turns) identified by the selected Method. One
background image will be created for each
individual element.
• Complete—Also renders the rest of the protein as
tubes. A single background contains the entire
rendered image.
Create Sets from Whether to create substructure sets for the secondary
Results structure elements identified when the Find button is
pressed.
• Per Chain—For each chain, create one set for the
combined sequences that match each of the confor-
mational state. The set names consist of the confor-
mational state followed by the chain name then
_DICT for the dictionary method or _KS for the
Kabsch-Sander method (e.g.
ALPHA_HELIX_A_KS).
• Detailed—Also create one set for each connected
sequence that matches a conformational state. The
additional set names consist of the conformational
state followed by an ID number followed by _DICT
or _KS (e.g. ALPHA_HELIX_1_KS).
Save Results as An optional text file reporting the conformational state
a File found for each residue. A .txt extension is recom-
mended, but is not appended automatically.
Find Find the secondary structure elements with the speci-
fied conditions and leave the dialog open.
If the molecule already has secondary structure sets by
the same name as those created by this operation, they
are deleted and regenerated.
Warning: All operations are cumulative.

9.3.2 Find Secondary Structure via the Command Line

Analyze the conformation of a biopolymer and find all sequences whose confor-
mation matches defined states. Optionally, store this information in substructure
sets.
BIOPOLYMER FIND SEC_STR mol_area method [conf_states]

sets output [filename]
mol_area Molecule area containing the molecule of interest.

method How to analyze the molecule:
• KABSCH-SANDER—Use the Kabsch-Sander method,
which relies on hydrogen-bonding patterns [Ref.
42].
• CONFORMATION—Match the conformational states
in the open dictionary. In order to match, a confor-
mational state must exist over a minimum number
of residues, as defined in the dictionary.
conf_states List of conformational state names separated by com-
mas or “*” for all defined states. Enter the information
as a comma separated list (type a ? in the console for a
list of options and formats).
sets Create substructure sets for the secondary structure ele-
ments identified by the command.
• CHAIN—For each chain, create one set for the
combined sequences that match each of the confor-
mational state. The set names consist of the confor-
mational state followed by the chain name then
_DICT for the dictionary method or _KS for the
Kabsch-Sander method (e.g.
ALPHA_HELIX_A_KS).
• DETAILED—Also create one set for each connected
sequence that matches a conformational state. The
additional set names consist of the conformational
state followed by an ID number followed by _DICT
or _KS (e.g. ALPHA_HELIX_1_KS).
• NONE—No sets are created.

output How to report the secondary structure elements found

by this command:
• SCREEN_ONLY—the results are reported in the
console in tabular form.
• FILE_ONLY—the results are written to the specified
file in tabular form.
• BOTH—the results are reported in the console and
stored in a file.
• NO_OUTPUT—the results are not reported.
filename The name of the optional output file.
9.3.3 Find Conformations via the Command Line

List in the console the sequences that match the specified conformational
state(s).
BIOPOLYMER FIND CONFORMATION conf_states sequence
conf_states List of conformational state names separated by com-

mas or “*” for all defined states
sequence Residue sequence(s) to analyze or “*” for the whole
molecule
BIOPOLYMER FIND CONFORMATION examines torsion angle values to

determine if they are consistent with conformational states defined in the
biopolymer dictionary. In order to match, a conformational state must exist over
a minimum number of residues, as defined in the dictionary.
The built-in set {FINDCONF(states,sequence)} performs the same function.

However, it returns the answer as a set of substructures instead of listing the
found sequences.
• Measure Conformation on page 174 to measure conformational angles in
the biopolymer
• Built-in set FINDCONF (in the SYBYL Basics Manual)

Assign Secondary Structure
9.4 Assign Secondary Structure

Determine the secondary structure of a protein, create local substructure sets,
and use them to render the secondary structure.
BIOPOLYMER ASSIGN_SEC_STR mol_area

• mol_area—Molecule area containing the protein whose secondary
structure stares are to be determined.
Note that the above command is identical to the BIOPOLYMER FIND_SEC_STR

command with the following options:
BIOPOLYMER FIND SEC_STR mol_area method * DETAILED
SCREEN_ONLY
Secondary Structure Determination:

The method used to identify the secondary structure elements is determined by
Tailor variable BIOPOLYMER ASSIGN_SEC_STR.
Substructure sets are created using the following rules:

• one set for each connected sequence that matches a conformational state
• one set per chain for the combined sequences that match a conforma-
tional state
The names of the substructure sets consist of the conformational state followed
by an ID number or a chain identifier, followed by _KS for the Kabsch-Sander
method or _DICT for the dictionary method. If the molecule already has
secondary structure sets by the same name as those created by this command,
they are deleted and regenerated.
The Kabsch-Sander method (default) relies on hydrogen-bonding patterns [Ref.

42]. The set names created using this method for 1crn.pdb are:
ALPHA_HELIX_1_KS A/ILE7 A/ARG17 IVARSNFNVCR
ALPHA_HELIX_2_KS A/GLU23 A/THR30 EAICATYT
ALPHA_HELIX_A_KS
BETA_SHEET_1_KS A/THR2 A/CYS4 TCC

BETA_SHEET_2_KS A/ILE33 A/ILE34 II
BETA_SHEET_3_KS A/ASN46 A/ASN46 N
BETA_SHEET_A_KS
TURN_1_KS A/LEU18 A/GLY20 LPG

TURN_2_KS A/GLY42 A/TYR44 GDY
TURN_A_KS

Assign Secondary Structure
The FINDCONF method compares Phi and Psi angles in the protein to those
stored in the dictionary for the following conformational states: alpha_helix,
beta_sheet, and turnI. The set names created using this method for 1crn.pdb
are:
ALPHA_HELIX_1_DICT A/ILE7 A/CYS16 IVARSNFNVC
ALPHA_HELIX_2_DICT A/GLU23 A/TYR29 EAICATY
ALPHA_HELIX_A_DICT
BETA_SHEET_1_DICT A/ILE33 A/ILE35 III

BETA_SHEET_A_DICT
TURNI_1_DICT A/GLY42 A/ASP43 GD

TURNI_A_DICT
Secondary Structure Rendering:

The substructure sets created by this command are used to render the secondary
structure as shaded surfaces. Characteristics of the cartoon representations are
determined by Tailor subject RENDER.
By default, the secondary structure elements are rendered as follows:

• Helices: magenta ribbons
• Sheets: yellow directional arrows
• Everything else: cyan tubes

Predict Secondary Structure
9.5 Predict Secondary Structure

Predict the secondary structure of a protein.
Biopolymer > Conformation > Predict Secondary Structure
BIOPOLYMER PREDICT_SECONDARY input_method

further_input out_filename predict_method [option1]
further_output [option2]
input_method READ_FROM_FILE, RECALL_RESULTS,

SYBYL_MOL_AREA, TYPE_AT_KEYBOARD
further_input Filename, molecule area or generic sequence, depend-
ing on the input_method selected.
out_filename Base name of the output files (extensions will be pro-
vided automatically)
predict_method MAXFIELD_SCHERAGA (Bayes Statistics),
GARNIER_OSGUTHORPE_ROBSON (Information The-
ory), QIAN_SEJNOWSKI (Neural Net),
USER_SPECIFIED
option1 Filename, needed only if the predict_method is
USER_SPECIFIED
further_output BUILD_NEW_MOL, NONE,
SET_EXISTING_CONFORMATION
option2 Molecule area, needed only if BUILD_NEW_MOL or
SET_EXISTING_CONFORMATION is chosen as
further_output
SYBYL includes several common methods for predicting the secondary

structure of proteins. Beware the limitations of these methods. They are far
from being 100% reliable. However, they can be useful as a starting point for
further analysis.
Files for READ_FROM_FILE can be either PIR or FASTA formatted files (see
Read PIR and FASTA Files on page 67), or a file with the following format:
• first line = number of residues,
• following lines (80 chars/line) = one letter residue codes.
You may use your own prediction program (USER_SPECIFIED option). The file
interface requirements for your program are described in Secondary Structure
Prediction on page 319.

Predict Secondary Structure
Files Created:
• .pred = file containing the predicted sequence
• .prob = file containing the probabilities of each state
• See Secondary Structure Prediction on page 319 for more details on
these methods
• See Secondary Structure Prediction Files on page 298 for the format of
the input and output files

Set Sidechain Conformation
9.6 Set Sidechain Conformation

To change the conformation of a biopolymer:
• individually for the sidechain of a specific residue (this page)
• globally for the backbone of a sequence of residues
See also the File Format for Rotamer Libraries on page 188.
9.6.1 Set Sidechain Conformation Via the Menubar

Set the sidechain conformation for one or more residues in a protein. Sidechain
conformations may be taken from rotamer libraries. Note: This feature works
only for proteins.
Biopolymer > Conformation > Set Sidechain Conformation

or
In the ORCHESTRAR Project Manager, click Analyze Sidechains.

Residue Selection
Residue Choose a residue among a list of residues pre-selected

via the Select button. The chosen residue is high-
lighted by a label in the graphics window and the close
contacts between its sidechain atoms and neighboring
atoms are represented by colored dashed lines. Contacts
are determined by AUTOMONITOR ON with the overlap
rations of 0.75 and 1.0 (see the Graphics Manual).
Select Access the Substructure Expression dialog to select the
residues whose conformation will be modified (this is
predetermined if you access this dialog via the Struc-
ture Preparation Tool). This selection populates the
Residue menu. Non-protein residues, such as waters,
metals, ligands, cofactors, and blocking groups are
automatically ignored.
List Only Resi- Toggle this on to populate the Residue pull-down with
dues Near residues that are nearest the ligand and whose
Ligands sidechains must be fixed. This feature is particularly
useful when preparing a protein for docking with Sur-
flex-Dock (see the Docking Manual).
Center View on Center the display of the molecule on the selected resi-
Residue due. Note that centering the display does not affect the
atoms’ coordinates.
Auto Center Toggle this on to automatically center the display of the
molecule on the selected residue. This is particularly
useful when using this dialog to work on several resi-
dues.
Backbone
Phi, Psi, Omega The values of the backbone angles for the selected resi-
due.

Sidechain
Rotamer Source • Initial —Resets the selected residue’s sidechain to

the conformation it had when the residue was
selected in this dialog.
• Lovell Library—Backbone independent rotamer
library as described by S.C. Lovell, J.M. Word, J.S.
Richardson and D.C. Richardson in “The Penul-
timate Rotamer Library.” Proteins: Structure
Function and Genetics, 40, 389-408 (2000).
rotamer.php
• User Defined—The rotamer source is set automat-
ically to this option if you change any of the values
for the Chi angles.
Chi1, Chi2, Chi3, These fields show the values of the sidechain angles in
Chi4 memory for the selected residue. If you type a new
value in any of these fields the Rotamer Source will
be automatically set to User Defined.
Rotamer List For each rotamer in the library, the list includes:
• %—the probability of occurrence. The probabilities
for all rotamers of a residue type total 100%. The
list is sorted by decreasing probability value.
• chi1, chi2, etc—values for CHI angles in the
specified library for the selected residue type.
Note: If the selected residue includes hydrogens there
will be one more measurable Chi value than there are
Chi columns in this list. In that case, the rotamer you
select and set will place the hydrogen in a staggered
orientation.
Action Buttons
Dashed lines display bumps between the selected residue’s sidechain and other
atoms in the molecule area, including water molecules (see distance monitoring
in the Graphics Manual).
Set Selected Apply the chosen sidechain conformation to the single

residue selected in the dialog.
Set Previous, Apply the conformation of the chosen rotamer to the
Set Next sidechain. These buttons allow you to step through the
various rotamers and see them on the screen.

Delta Energy Reports, for the residue selected at the top of the dialog,
the difference between the energy of the applied rota-
mer and the energy of the Initial conformer. Energy
values are computed using the Tripos force field.
Use the check box to toggle off this feature.
Scan Sidechain
NUMBER_INCREMENTS
vdW Factor Constant scaling factor to apply to all van der Waals
radii.
VDW_SCALE.
Scan Selected Attempt to remove steric interactions between the
Residue selected residue and the rest of the molecule. Torsion
angles in the residue’s sidechain are scanned, through a
full 360°, for positions that relieve bad steric interac-
tions. Only one bond at a time is altered. After a posi-
tion is found, that bond is removed from consideration.
Scanning continues until all interactions dependent
upon these bonds are relieved or until no progress is
made from one iteration to the next.
Minimization
Minimize Access the Minimize dialog (see the Force Field Manual
Selected Resi- for details) to optimize the geometry of the selected res-
due idue.
• Set Backbone Conformation Via the Menubar on page 175
• Set Biopolymer Conformation via the Command Line on page 175
9.6.2 File Format for Rotamer Libraries

A rotamer library consists of a directory containing a file for each residue that
has defined rotamers defined. Each file in the library must be named as the
lowercase 3-letter code for the matching residue with a .lib extension (e.g.
asn.lib, cyh.lib, etc.).
The format of the files is as follows:

• For a backbone-independent library (such as Lovell):

1 - number of backbone sets for this residue
0 0 - PHI & PSI, but zeroes if only 1 entry for a residue
9 3 - number of rotamers | number of records per line
39 -65. -20. - probability | chi1 | chi2
15 -177. 30.
12 -174. -20.
• For a backbone-dependent library:

36 - number of backbone sets for this residue
-180 -180 - PHI angle | PSI angle
49.3 53.7 54.6 - probability | chi1 | chi2
26.8 64.1 9.2
8.1 64.8 -57.0
...
-180 -170 - PHI angle | PSI angle
41.0 63.1 11.8 - probability | chi1 | chi2
...
Tripos-supplied rotamer libraries are in $TA_ROOT/biopolymer/tables/

ROTAMERS.
We strongly recommend that user-defined rotamer libraries be stored outside of

the SYBYL tree. Use Tailor variable BIOPOLYMER ROTAMER_DIRECTORY to
identify the location of user-defined rotamer libraries. Each subdirectory of the
specified directory will be picked up by the appropriate dialogs and added to the
list of rotamer libraries.
Dunbrack Rotamer Library
A free backbone-dependent rotamer library, compatible with SYBYL, is

available for download from the Dunbrack Lab at:
http://dunbrack.fccc.edu/tripos/Dunbrack.rotlib.tar.gz
This library was described by R. L. Dunbrack, Jr. and M. Karplus in

“Backbone-dependent Rotamer Library for Proteins: Application to Side-chain
prediction.” J. Mol. Biol. 230, 543-574 (1993).
To install the Dunbrack rotamer libraries:

It is strongly recommended that the Dunbrack libraries be installed outside of
the SYBYL tree.
1. Download the Dunbrack.rotlib.tar.gz file to a directory where they can be
made accessible to Biopolymer users at your site.
For example: /usr/tripos/rotamers

2. Unzip and extract the files:

gunzip -9 Dunbrack.rotlib.tar.gz
tar xvf Dunbrack.rotlib.tar
tar xvf Dunbrack.tar
3. After opening SYBYL, make the newly added library available:
Options > Tailor
Go to the page that contains the variable ROTAMER_DIRECTORY.
In the ROTAMER_DIRECTORY field, enter the full path to the directory
that contains the Dunbrack directory.
For example, if your Dunbrack directory is:
/usr/tripos/rotamers/Dunbrack
you would enter: /usr/tripos/rotamers
Press Apply at the bottom of the Tailor dialog, then Close.
The SYBYL dialogs will automatically detect the Dunbrack rotamer library
and make it available during the current SYBYL session.
4. To make the Dunbrack library available automatically in future SYBYL
sessions, add the following line to each user’s $HOME/sybyl.ini file
(sample sybyl.ini file in the Toolkit Utilities Manual):
setvar TAILOR!BIOPOLYMER!ROTAMER_DIRECTORY dir_name
where dir_name is the full path to the directory containing the local
rotamer library directory(ies).
All subdirectories of the specified directory will be picked up by the appro-
priate dialogs and added to the list of Tripos-supplied rotamer libraries.

Scan Sidechain Torsions
9.7 Scan Sidechain Torsions

Scan the rotatable sidechain bonds of a biopolymer for positions with no close
van der Waals contacts. This allows coarse relaxation of strain in sidechain
groups prior to energetic refinements.
Biopolymer > Conformation > Scan Sidechain Torsions
BIOPOLYMER FIX_SIDECHAINS sequence_expr
This functionality uses the built-in sets {SIDECHAIN}, {RINGS}, and

{TO_ATOMS} to create a Boolean algebra expression for the rotatable bonds in
the sidechains of biopolymers.
The SCAN command is executed on all rotatable bonds in the selected residues’
sidechains with a fixed angle increment of 3° so that minimal changes from
starting geometries will be made. The “hardness” of the vdW spheres can be
adjusted using Tailor variable SCAN VDW_SCALE.
In the case of large proteins, if the whole molecule is selected this operation can
be time consuming.
Scanning the sidechain torsion to relieve bad contacts is recommended after

modifying the biopolymer’s composition:
• Replace Sequence on page 167
• Mutate Monomers on page 168
• Insert Monomers on page 169
Built-in sets (in the SYBYL Basics Manual)

Copy Conformation
9.8 Copy Conformation

Copy the conformation of one residue sequence to another.
Biopolymer > Conformation > Copy Conformation
BIOPOLYMER COPY_CONFORMATION source_sequence

target_sequence conf_params
source_sequence Residue sequence from which the conformation is cop-

ied
target_sequence Residue sequence to which the conformation is copied
conf_params List of conformational parameters to copy, separated by
commas. Enter an asterisk (*) to copy all defined
parameters.
Note that only conformational angles defined in the dictionary can be copied
with this command.
When the copying operation encounters a proline in the target sequence the
option of distorting the proline geometry to match the conformation found in
the source sequence is determined by the status of Tailor variable BIOPOLYMER
PROLINE_GEOMETRY.

10. Protein Loop Search
• Introduction to Loop Searches on page 194
• Search PRODAT Database for Loops on page 196
• Tweak Conformational Loop Search on page 198
• Loop Search Results in a Spreadsheet on page 201
• BIOPOLYMER LOOP ANALYZE Command on page 203

Introduction to Loop Searches
10.1 Introduction to Loop Searches

A loop search is a useful operation in protein modeling when insertions and/or
deletions occur in the polypeptide chain. The goal under such circumstances is
to fill a gap between two pieces of the chain in a way that preserves (1) the
original relative orientation of the flanking segments, and (2) the length (as
input by the user) of the inserted piece or “loop” while at the same time
providing correct local covalent geometry.
SYBYL addresses this problem by:

• searching a database of protein fragments, selected from the Protein
Data Bank of crystallized proteins, and
• retrieving loops whose anchor regions have a good geometric fit to the
anchor regions of the modeled protein.
Application of this procedure often produces several candidate fragments. The

ANALYZE facility provides graphical tools for analyzing and selecting from the
retrieved fragments on the basis of the quality of fit to the anchor regions,
sequence homology, steric interactions, and other criteria. Such selection
criteria can be used to choose a smaller set of candidate loops.
Loop Fragment:
When the loop search is complete, SYBYL writes a file (filename.loop)
containing the parameters of the loop search and the loop fragments. For each
loop fragment, the following information is stored:
• source of the fragment;
• amino acid sequence;
• sequence homology score of the window region, using the homology
matrix specified with Tailor variable BIOPOLYMER
SIMILARITY_MATRIX;
• RMS fit to anchor regions;
• coordinates of the backbone atoms in the loop, transformed to fit the
reference molecule.
The structure of the original protein is preserved by writing a Mol2 file

associated with the run name of the loop search (filename.mol2).
Normally only backbone atoms are produced. Sidechains can be added later (see
Add Sidechains on page 125).
Once the loop search is completed, you can analyze the results immediately (see
Loop Search Results in a Spreadsheet on page 201 for a description of available
analysis tools). Since results are saved in a file, you can also analyze the results

Introduction to Loop Searches
at a later time. However, certain changes to the reference molecule (in

particular, changing the coordinates of the anchor atoms, even by centering or
minimizing) will render the loop results invalid.
Homology Score
The homology score is calculated as follows:
• First the current homology matrix is examined to determine the
similarity between each residue in the two sequences.
• These scores are summed over the window region (excluding the anchor
region). This is the “target vs. database fragment score”.
• The similarity score of the target sequence vs. itself is calculated (the
“target vs. target score”). The final score is then calculated.
TargetVsTargetScore
IdentityScore = ---------------------------------------------------- – MeanOfHomologyMatrix
WindowLength
TargetVsDatabaseFragmentScore
LoopScore = ---------------------------------------------------------------------------------- – MeanOfHomologyMatrix
WindowLength
LoopScore
FinalScore = 100 --------------------------------
IdentityScore
The final homology score is independent of sequence length and similarity

matrix.
References:
[3] Jones, T. A. and Thirup, S. (1986) EMBO Journal, 5:4, 819-822.
[4] Claessens, M.; Van Cutsem, E.; Lasters, I.; Wodak, S. Protein
Engineering (1989) 2:5, 335-345.
• Biopolymer Tweak for a computational method for generating loop
conformations
• Excise Monomers on page 171 to delete residues from a chain and join
the adjacent residues to close the gap in the backbone
• Insert Monomers on page 169 to insert residue(s) in a chain
• Protein Loop Searching on page 320 for a detailed description of the
methods used in loop searching
• Biopolymer Loop Files on page 294 for a description of the file format
• Binary Protein Database: PRODAT on page 255 and its graphical user
interface on page 256

Search PRODAT Database for Loops
10.2 Search PRODAT Database for Loops

Prepare the conditions for a protein loop search.
Biopolymer > Protein Loops > Search PRODAT Database
BIOPOLYMER LOOP SETUP preceding_res following_res

sequence filename table_name {command}
preceding_res Residue immediately preceding the unknown region.

This will be one of the N-anchor residues.
following_res Residue immediately following the unknown region.
This will be one of the C-anchor residues.
sequence Sequence of amino acids to fill in between
preceding_res and following_res. This sequence deter-
mines the length of the loop.
filename Name of file in which to store results of this run
table_name Name for data table to analyze results
command Arguments to the BIOPOLYMER LOOP ANALYZE com-
mand
The protein database is searched for fragments of the indicated length that fit
well between the 2 flanking residues. This may be used to model local changes
in the protein’s conformation introduced by the insertion or deletion of residues.
For the remainder of this discussion, the residues between the two flanking
residues you specify are called the window region of the loop. The two regions
containing the flanking residues are called the anchor regions of the loop. The
total number of residues in the loop fragments found by this procedure will be
the number of residues in the window region plus the number of residues in the
2 anchor regions. As a special case, you can search for terminal loops by speci-
fying a non-existent preceding_res or following_ res. In this case, there is only a
single anchor region.
You can perform a loop search on any protein molecule in SYBYL. While the
anchor regions must exist in the protein, the window region can either be
already present in the molecule (with any number of residues) or missing; in
either case, the biopolymer loop analysis will fill in the entire loop region with
the proper residues.
Tailor Variables:
• Tailor subject PROTEIN_LOOP
• Tailor subject PROTEIN_SEARCH

Search PRODAT Database for Loops
• Biopolymer Tweak to perform an ab initio loop generation
• See Protein Folding And Model Generation on page 319 for information
on the method used to derive loop conformations

Tweak Conformational Loop Search
10.3 Tweak Conformational Loop Search

Perform an ab initio generation of loop conformations, an alternative to protein
loop searches.
Biopolymer > Protein Loops > Tweak Conformational Search
BIOPOLYMER TWEAK preceding_res following_res sequence

name {command}
preceding_res The residue immediately preceding the unknown

region. This is the single N-anchor residue.
following_res The residue immediately following the unknown
region. This is the single C-anchor residue.
sequence The sequence of amino acids to fill in between
preceding_res and following_res. This sequence deter-
mines the length and residue identities of the loop.
filename The name of the file to contain the results of this run
command BIOPOLYMER LOOP ANALYZE command and argu-
ments
Analysis of the results is described in Loop Search Results in a Spreadsheet on

page 201.
Biopolymer Tweak provides an ab initio method for generating loop geometries

in contrast to Biopolymer PRODAT Search, which uses a database of known
protein fragments. Although the methods of loop generation differ, Tweak
produces loop fragments satisfying the same conditions as those found with
Biopolymer PRODAT Search (i.e., the relative orientation of the flanking
segments are preserved and the inserted fragment has the required number of
residues as input by the user).
Biopolymer Tweak and Biopolymer PRODAT Search use the same definitions
for window region and anchor region with the exception that a Tweak anchor
region always consists of a single residue. Thus the total number of residues in
the loop fragments found by Tweak will be the number of residues in the
window region plus 2 (one for each anchor region).
When the loop search is complete, SYBYL writes a file (extension is .loop)
containing the parameters of the loop search and the loop fragments. For each
loop fragment, the following information is stored:
• the source of the fragment,
• amino acid sequence,
• RMS fit to anchor regions,

• coordinates of the backbone atoms in the loop, transformed to fit the

reference molecule.
The structure of the original protein is preserved by writing a mol2 file

associated with the Biopolymer Tweak run name (filename.mol2).
Note that the source of the fragment is always TWEAK_XX (where XX is the
loop number, the amino acid sequence for all Biopolymer Tweak-generated
loops is constant per run, and the RMS fit to the anchor regions will always be
very close to zero.
Only backbone atoms are supplied. You can add sidechains later (see Add
Sidechains on page 125).
Method:
Biopolymer Tweak initially defines four distance constraints and their target
values between the CA/N atoms of one anchor region and the CA/C atoms of
the other anchor region.
A protein fragment of the required number of user-specified residues is then

constructed with random phi/psi angles taken from a uniform distribution
(proline phi angles are untouched).
The distance constraints are measured and a difference vector between the
actual and target distance constraints is computed, along with a matrix
containing the derivatives of each distance constraint with respect to each
torsion angle. A set of optimal corrections to the torsion angles is calculated
from a 4x4 linear system defined by the difference vector and the derivative
matrix. Optimal corrections are then limited in magnitude by Tailor variable
TWEAK MAX_TORSIONAL_CHANGE. The final torsional corrections are applied
to the fragment to give a new set of atomic coordinates. This process is repeated
until either the number of iterations is exceeded (Tailor variable TWEAK
MAX_ITERATIONS) and the fragment is rejected, or the magnitude of the
difference vector is less than the value of Tailor variable TWEAK
TARGET_DISTANCE_TOLERANCE and the fragment is subjected to chirality tests
and optional bump checking. Loops rejected for exceeding the iteration limit are
written to the terminal with the symbol d.
Loop chirality is checked against the chirality defined in the anchor regions and
loops failing this test are rejected and written to the terminal with the symbol c.
If Tailor variable TWEAK DO_BUMP_CHECK is YES, a van der Waals bump

check is done on the backbone atoms of the loop (see Tailor variable GENERAL
BUMPS_CONTACT_DISTANCE and BUMPS_NEIGHBOR_DISTANCE to control the
bump checking algorithm). The bump check is carried out only within the loop,
not between the loop and the rest of the molecule. Loops failing the optional
bump check are rejected and written to the terminal with the symbol b.

If a loop fragment has passed all its screening tests it is finally accepted, fitted
to the original anchor region, and written to the file filename.loop for
upcoming analysis (see Loop Search Results in a Spreadsheet on page 201).
Accepted fragments are written to the terminal with the symbol + followed by a
line terminator.
This entire method is repeated until the number of accepted loop fragments
equals that specified by Tailor variable TWEAK NLOOPS. The resulting loop
fragments are then passed to the analysis functionality.
UIMS2 Variables:
• MNDL_TWEAK_SEED = the user-supplied seed for the random number
used by BIOPOLYMER TWEAK.
• BIO_LOOP_NLOOPS = the number of loop conformations being analyzed.
• Biopolymer PRODAT Search for more information on loop generation
by database searching
• Excise Monomers on page 171 to delete residues from a chain and join
the adjacent residues to close the gap in the backbone
• Insert Monomers on page 169 to insert one or more residue in a chain
• See Random Tweak Loop Generation on page 322 for more detail on the
tweak algorithm

Loop Search Results in a Spreadsheet
10.4 Loop Search Results in a Spreadsheet

Analyze the results of a protein loop search with the molecular data explorer.
Access:
1. The protein for which the loop search was performed must be present.
2. Biopolymer > Protein Loops > Analyze Search Results and retrieve
a .loop file.
3. The loop candidates are loaded in a spreadsheet within the molecular data
explorer.
10.4.1 Biopolymer Loop Spreadsheet

The spreadsheet is the central tool to analyze the suitability of the candidate
loops. The information about the loop fragments consists of:
• Name—The row names report the source of the retrieved loops (PDB
code and starting residue).
• ID—The ranking of the loops found by the search. Loops are ranked
from best (lowest) RMS fit to highest.
• Sequence—The actual sequence in each loop retrieved from PRODAT.
The length of this sequence is affected by Tailor variables that determine
the number of residues preceding and following the loop window. (By
default, NANCHOR_N and NANCHOR_C are set to 2 and 1, respectively).
• Homology—The score comparing the actual sequence to the target or
model sequence. See Homology Score on page 195.
• Fit_RMS—A measure of the RMS fit of the retrieved loop to the anchor
residues.
Upon opening a biopolymer loop spreadsheet the loop in the first row is
automatically melded into the protein.
Biopolymer Menu
A spreadsheet containing the results of a protein loop search includes the

following features in the molecular data explorer menubar:
MDE: Biopolymer
• Examine Selected Loops
• Display All Loops
• Color Loop

Loop Search Results in a Spreadsheet
10.4.2 Color Loop Atoms in the Protein

One of the candidate loops in the spreadsheet is melded into the protein.
Applying a solid color to loop residues makes them more easily identifiable.
MDE: Biopolymer > Color Loop
10.4.3 Examine Selected Loops

In a spreadsheet containing the results of a biopolymer loop search select at
least one row then MDE: Biopolymer > Examine Selected Loops.
• If you selected a single row, that loop is spliced into the protein
associated with the spreadsheet.
• If you selected several rows, the first loop is spliced into the protein, and
a dialog provides the navigation tools to examine the other loops.
Current Row Row number and name of the loop.

Previous Next Navigation buttons to examine the selected loops.
Mark Current Mark the row containing the loop being examined.
Jump to Row Type a row number. If the specified row is not within
the selected rows a dialog warns you that it will be
added to the current selection.
Copy Molecule Copy the molecule (protein with currently examined
to loop) to the specified molecule area. The first available
molecule area is selected by default.
10.4.4 Display All Loop Candidates

Display all loop candidates superimposed in another molecule area.
MDE: Biopolymer > Display All Loops

BIOPOLYMER LOOP ANALYZE Command
10.5 BIOPOLYMER LOOP ANALYZE Command

BIOPOLYMER LOOP ANALYZE table_name [mol_area
loop_file]
table_name Name of the data table to use in the analysis. The next
two arguments will be skipped if a table with this name
is already open.
mol_area Molecule on which a loop search was previously run.
loop_file File containing the loop results (default file extension:
.loop)
The Loop Analyze Command prompt remains active until EXIT.
The following loop analyze options are available in any sequence:
ADD_COLUMNS EXIT SELECT_LOOP

COLOR LIST
DISPLAY_LOOPS SAVE
No longer supported: CLEAR, INTERACT, GRAPH, UNHILIGHT
At the conclusion of LOOP ANALYZE, the molecule will be left with the selected
loop fragment inserted. You can recover the original (pre-loop search) molecule
by reading in the Mol2 file associated with the BIOPOLYMER LOOP run name.
In addition to the data directly available for each loop fragment (atomic coordi-
nates, source of the fragment, residue sequence of the fragment, RMS deviation
of the least-squares fit to the anchor region), you can derive additional infor-
mation to help in selecting from among the candidate fragments.
10.5.1 Color Atoms in the Loop

BIOPOLYMER LOOP ANALYZE ... COLOR color
• color = BY_ATOM_TYPE or the name of a uniform color
10.5.2 Display All Loops

BIOPOLYMER LOOP ANALYZE ... DISPLAY_LOOPS mol_area
• mol_area = molecule area to contain the loops

10.5.3 Select and Meld Loop Fragments

BIOPOLYMER LOOP ANALYZE ... SELECT_LOOP {loop_number}
• loop_number = the ID number(s) of the loop(s) to be inserted
In addition to inserting the selected loop in the molecule, this command will
print its source and original amino acid sequence. It prompts repeatedly until
you press the end-loop character (|) or abort character (^).
10.5.4 Add Columns to the Table

BIOPOLYMER LOOP ANALYZE ... ADD_COLUMNS
• The angle between three specified, but not necessarily bonded, atoms.
ANGLE column_name atom1 atom2 atom3
• The distance between two specified, but not necessarily bonded, atoms.
DISTANCE column_name atom1 atom2
• The root-mean-square distance deviation of a subset of the atoms in one
specified loop from the same atoms in all other loops. SYBYL automati-
cally ignores any atoms you select that are not in the loop, since these do
not change position.
INTER_LOOP_RMS column_name atom_expr reference_loop_ID
• The omega torsion angle for a specified residue.
OMEGA column_name residue
• The phi torsion angle for a specified residue.
PHI column_name residue
• The psi torsion angle for a specified residue.
PSI column_name residue
• To compare the amino acid sequence of the reference protein with the
sequence of the fragments from which the loop coordinates were
extracted, and compute a homology score. In general, a higher score
indicates a better match. The homology computation is based on a
matrix (specified by Tailor variable BIOPOLYMER
SIMILARITY_MATRIX) that gives, for each possible pair of amino acids,
the score when one changes to the other. This column uses the similarity
matrix currently in force. The score is computed over the entire loop
region, including the anchor residues. Use List Table Data for a
description of how the homology score is calculated.
SEQUENCE_HOMOLOGY column_name
• The torsion angle between four, not necessarily bonded, atoms.

TORSION column_name atom1 atom2 atom3 atom4

• The number of contacts (bumps) between the loop backbone atoms and
the rest of the molecule.
VDW_CONTACTS column_name
Remarks:
Use of this data provides an easy way of weeding out some unreasonable
conformations.
Tailor subject GENERAL for additional bump monitoring adjustments
10.5.5 List Table Data

BIOPOLYMER LOOP ANALYZE ... LIST option
options =
• COLUMN_DESCRIPTORS—Lists the name of each column in the data
table
• LOOPS—For each loop fragment, lists the loop number, RMS fit to the
anchor regions, homology score in the window region, source of the
fragment, and amino acid sequence of the source. See Homology Score
on page 195.
• PARAMETERS—Lists the parameters used in running the loop search
• TABLE_DATA—Lists all numeric data in the table.
Tailor Variables:
Tailor variable BIOPOLYMER SIMILARITY_MATRIX
10.5.6 Save Molecules With Melded loops

Save versions of the molecule with various loop conformations in a SYBYL
database. Note that the entire molecule is added to the database, not just the
loop region. The loop source will be appended to the molecule name.
BIOPOLYMER LOOP ANALYZE ... SAVE option

option =
• CREATE_DATABASE—Creates a new molecule database
• OPEN_DATABASE—Opens a database
• SAVE_HILIGHT_SET—Selects all loop conformations that are currently
highlighted in graphs as candidates for saving

• SAVE_USER_SPECIFIED—Prompts for which loops should be candi-

dates for saving:
• ALL_REMAINING—save this conformation and all the remaining
selected conformations, without further prompting
• KEEP_THIS_ROW—Save this loop; prompt for next one selected
• QUIT—Do not save any more loops; no more prompting
• SKIP_THIS_ROW—Do not save this loop; prompt for the next
selected loop
10.5.7 Exit the Analysis

BIOPOLYMER LOOP ANALYZE ... EXIT
After exiting from BIOPOLYMER LOOP ANALYZE, you can return and continue
analyzing the same table any time during the SYBYL session. If you have
added columns to your table, you can save it using TABLE SAVE. If you have
not added any columns, there is no need to save the table, since it can be
immediately recreated from the loop search results.

11. Compare Biopolymer Sequences
• Align Sequences and Write MSA on page 208
• View/Edit Alignments on page 212
• List Biopolymer Sequence on page 213
• FUGUE (in the FUGUE Manual)

Align Sequences and Write MSA
11.1 Align Sequences and Write MSA

Find an optimal alignment between two or more sequences and produce a
multiple sequence alignment.
• The alignment procedure uses the Needleman and Wunsch algorithm
(J. Mol. Biol. 1970, 48, 443) for the pairwise alignment of sequences
• Read and Write PIR and FASTA Files on page 67 for a description of
.pir formatted files and how to create them
• Sequence Alignment on page 311 for a discussion of sequence alignment
methods and implementations
11.1.1 Align Sequences via the Menubar

Biopolymer > Compare Sequences > Align and Write MSA
Source Files to Align
List The sequence files and molecules identified so far

appear in the list. The order in which you specify the
sequence files and molecules determines the order in
the list and in the statistics presented in the console.
By default a maximum of 40 sequences can be aligned.
Tailor variable BIOPOLYMER MAX_SEQUENCES

Add Sequence Access a browser to retrieve a file in PIR, FASTA or

File MSF format.
Add Molecule Access a browser to select one or more molecule areas
containing protein structures.
Remove Select one or more items in the list, then press this but-
Selected ton to remove them from the sequence alignment.
This button is accessible only when you have selected
one or more items in the list above.
Sequence Alignments Details
Similarity Matrix Select the type of similarity matrix: apg, greer, iden-
tity, mutation, physprop, pmutation (default),
swiss, or swiss2. You may also define your own sim-
ilarity matrix and use the file browser to specify its
location.
Tailor variable BIOPOLYMER SIMILARITY_MATRIX
Gap penalty Enter a positive number used to penalize gaps in
aligned sequences. Large values discourage insertion of
gaps in the alignment. The default gap penalty is auto-
matically adjusted when you select another Similarity
Matrix.
Tailor variable BIOPOLYMER GAP_PENALTY
Output Format Specified the format of the output file(s): MSF, PIR or
FASTA.
Output Name Enter a base name for the output files (do not provide
an extension). A multiple sequence format file with the
extension .msf will contain the alignment.
Edit in Check this box if you intend to view and edit the
Sequence sequence alignment with the Sequence Viewer. The nec-
Viewer essary set of files will be created based on the Run
name. See View/Edit Alignments on page 212.
Information about the alignment (PID, confidence, etc.) is listed in the console.
If more than two sequences are aligned, the pairwise sequence identity matrix is
also presented.
11.1.2 Align Sequences via the Command Line & Write MSA
Find an optimal alignment between two or more sequences and produce a
multiple sequence alignment (MSA) file.
BIOPOLYMER MULT_ALIGN_SEQ pir1 pir2 [{pir}] |

run_name

pir1, pir2, ... Names of the files in .pir format (on letter residue
strings) containing the sequences of the molecules. At
least two sequences must be provided; end the list with
the end loop character (|). By default a maximum of 40
sequences can be aligned.
Tailor variable BIOPOLYMER MAX_SEQUENCES
run_name Base name for the output files (do not provide an exten-
sion). A multiple sequence format file by the name of
run_name.msf will contain the alignment.
Tailor Variables:
• Tailor variable BIOPOLYMER GAP_PENALTY
• Tailor variable BIOPOLYMER SIMILARITY_MATRIX
11.1.3 Align Two Sequences via the Command Line

Find an optimal alignment between the sequences in two PIR files.
BIOPOLYMER ALIGN_SEQUENCES pir1 pir2
pir1 Name of the file containing the sequence of the first

molecule (the default extension is .pir)
pir2 Name of the file containing the sequence of the second
molecule (the default extension is .pir)
BIOPOLYMER ALIGN_SEQUENCES uses the Needleman and Wunsch algorithm

(J. Mol. Biol. (1970) 48, 443) to find an optimal alignment of two residue
sequences.
By default, BIOPOLYMER ALIGN_SEQUENCES calculates the % identity

between two sequences by using the length of the shortest sequence as the
denominator.
Tailor Variables:
• Tailor variable BIOPOLYMER IDENT_MODE

• Tailor variable BIOPOLYMER NUMBER_JUMBLES


View/Edit Alignments
11.2 View/Edit Alignments

View and edit a sequence alignment.
Menubar: Biopolymer > Compare Sequences > View/Edit

Alignments
Enter the name of a multiple alignment file.
Command: Not available.
By default, the % identity between two sequences is computed by using the

length of the shortest sequence as the denominator. A Tailor variable allows you
to use either the shortest or the reference sequence (Tailor variable
BIOPOLYMER IDENT_MODE).

List Biopolymer Sequence
11.3 List Biopolymer Sequence

List the complete residue sequences of all chains in a biopolymer.
Menubar: Biopolymer > Compare Sequences > List

Sequence
Command: BIOPOLYMER SEQUENCE mol_area
Multiple A chains are occasionally found in PDB files. Listing the biopolymer’s
sequence will reveal that. We recommend that you:
• give all chains unique names: Set Chain Names on page 132
• edit the chain termini: Chain Termini Sets on page 131
LIST SEQUENCE to list partial sequences

12. Compare Biopolymer Structures
• Fit Monomers on page 216
• Align Structures by Homology on page 218
• RMS Fits of Conformers on page 221
• Find and Fit Fixed Regions on page 223

Fit Monomers
12.1 Fit Monomers

Perform a least squares fit between the same number of residues in two
biopolymers.
12.1.1 Fit Monomers via the Menubar

Biopolymer > Compare Structures > Fit Monomers
Monomer Press [...] to access the Substructure Expression dialog

Sequence in and specify the residues to be used as reference in the
Reference Mole- fit. The selected residues do not need to be connected
cule sequentially. The reference molecule’s coordinates will
not be altered.
Monomer Press [...] to access the Substructure Expression dialog
Sequence in and specify the residues to be fitted. The selected resi-
Molecule to Fit dues do not need to be connected sequentially. How-
ever, an identical number of residues must be selected
in both molecules. This molecule will be transformed to
achieve the fit.
Atoms to Use Select the atoms to be used for the least-squares fit:
for Fit Calpha, Backbone, Sidechain or All. These atoms
are also used to compute the RMSD value reported in
the console.
List the Dis- Whether to list all the resulting distances between atom
tances pairs of the fit, or just the root mean square (RMS) dis-
tance.
Fit You can iteratively fit monomers in multiple protein
sequences.

Fit Monomers
12.1.2 Fit Monomers via the Command Line

BIOPOLYMER FIT ref_sequence fit_sequence atom_expr
list_distance
ref_sequence Residues to be used as reference in the fit. The selected

residues do not need to be connected sequentially. The
reference molecule’s coordinates will not be altered.
fit_sequence Residues to be fitted. The selected residues do not need
to be connected sequentially. However, an identical
number of residues must be selected in both molecules.
This molecule will be transformed to achieve the fit.
atom_expr Subset of the atoms in reference_seq to use in the fit.
These atoms are also used to compute the RMSD value
reported in the console.
list_distance Whether to list all the resulting distances between atom
pairs of the fit, or just the root mean square (RMS) dis-
tance
The atom_expr argument allows you to specify a subset of the sequence atoms
to be used in the fit (e.g. only backbone atoms). The sequences to fit must have
the same length.
UIMS2 Variable:
• FIT_RMS = the RMS deviation computed by the least squares fit.
• Find and Fit Fixed Regions on page 223 to perform a fit for multiple
conformations of a biopolymer
• Fit Atoms to perform a least squares fit between any atoms
• Match Atoms to automatically find a match and fit 2 molecules

Align Structures by Homology
12.2 Align Structures by Homology

Align proteins on the basis of their sequence similarity.
Warning: The maximum number of residues per protein is 5000.
12.2.1 Align Structures via the Menubar

Biopolymer > Compare Structures > Align Structures by Homology
Fixed Reference Select the reference molecule. It will remain fixed dur-
Structure ing the alignment process.
Movable Struc- Select one, several or all other molecules to be fitted to
ture(s) the reference molecule.
Buttons to assist in the selection of movable proteins.
One protein must be selected in the list for the action
buttons at the bottom of the dialog to be active.
Structure Alignment Details
Atoms to Use Specify the type of atoms to be used for the least-
for Fit squares fit: C-Alpha, Backbone, Sidechain or All.

Create and Use Specify how to perform the alignment:

MSA • Off—Each selected movable structure is treated
against the Fixed Reference Structure.
• On—First perform a multiple sequence alignment
(MSA) using all selected structures (fixed and
movable) then align each selected movable structure
to the MSA.
Align Seed Resi- Whether to use only the seed (identical) or all residues
dues Only to perform the alignment and/or compute the RMSD.
List Distances Whether to list the pairwise distances between atoms
used to perform the alignment and/or compute the
RMSD.
This option is not available when Create and Use
MSA is active.
Show RMSD Display, in a separate dialog, the pairwise Root Mean
Plot Square Distance (in Å) for the specified type of atoms
in the selected homologs. The numerical representation
of the same grid is printed in the console.
This option is available only when Create and Use
MSA is active.
Action Buttons
Align Structures Align the selected structures as specified in the dialog,

either to a single reference structure or to the multiple
sequence alignment.
You can align iteratively multiple protein sequences.
Calculate RMSD Compute the RMSD values against a single reference
structure or against the multiple sequence alignment.
Information in the Console
Without an MSA:
• %ID—% of identical residues; gaps in the alignment are ignored
• Score—The sum of the homology scores for the two sequences
• Sig.—Significance score (read about Jumbling and Significance on page
315); affected by Tailor variable BIOPOLYMER NUMBER_JUMBLES
• RMSD—Root Mean Square Distance for the specified types of atoms
With an MSA:
• Identity—% of identical residues; gaps in the alignment are ignored
• Score—The sum of the homology scores for all the sequences

• Pairwise sequence identity grid—% of identical residues; gaps in the

alignment are ignored
• Pairwise RMSD grid—Root Mean Square Distance for the specified
types of atoms
Note: Sequence alignment cannot be used reliably for structural alignment if:
• The sequence identity is lower than 30%.
• The significance score is lower than a 4.
12.2.2 Align Structures via the Command Line

Align two protein structures on the basis of their sequence similarity. Different
sets of atoms can be used to direct the alignment.
BIOPOLYMER ALIGN_STRUCTURES molarea1 molarea2 [atoms]
molarea1 The reference molecule

molarea2 The molecule to be fitted to the reference molecule
atoms Atoms used to direct the alignment: CALPHA (default),
BACKBONE, SIDECHAIN or * (all atoms)
12.2.3 Alignment Procedure

1. The sequences of the proteins are aligned (see Align Sequences and Write
MSA on page 208). Tailor variables control the gap penalty, the number of
jumbles, and the similarity matrix used:
• Tailor variable BIOPOLYMER NUMBER_JUMBLES
2. The alignment process includes writing out each structure as a separate .pir
file. If any of the proteins contains more than one chain, the chains are
linked artificially so that a single .pir file can be written for each structure.
3. The sequences are then compared (see Fit Monomers on page 216) by super-
imposing the designated atoms (C-alpha or others) in the residues that match
in the alignment. The distance in Å for each pair as well as the overall
weighted root mean square distance are listed in the console.
4. The .pir files are deleted during cleanup.

RMS Fits of Conformers
12.3 RMS Fits of Conformers

Perform a local RMS fitting of a series of conformers to the average conformer
and report the results in a ProTable created using the average conformer. It is
designed to analyze the local conformational variability of calculated
biopolymer structures, such as those created by distance geometry.
Biopolymer > Compare Structures > Local RMS Fits of Conformers
Database Name Select one of the opened databases or open a new one.
• Open—This option brings up a file browser.
• xxx.mdb—If you already have a database open, its
name will appear here.
Fitting Options Activate one or more of the following fitting options.
There will be at least one column in the resulting
ProTable spreadsheet for each of the selections made
here.
• Backbone Atoms—backbone atoms, as defined
in the dictionary
• Heavy Atoms—all non-hydrogen atoms
• All Atoms—all atoms
• Alpha Carbons—alpha carbons only
Segment Length Activate one of the following mutually exclusive
options. This selection is combined with the Fitting
Options, and the results are reported in separate col-
umns in a ProTable spreadsheet.
• One Residue—calculates one residue fits
• Three Residues—calculates three residue fits.
Only three-residue fits are reported for Alpha
Carbons.
• Report Both—reports the results of both one
residue and three residue fits

RMS Fits of Conformers
Generate New Check this box to perform an RMS-based alignment of

Alignment of the molecules in the database prior to calculating the
Conformers average structure. The BIOPOLYMER RESIDUE_FIT
command is used to align the conformers, iterating with
a decreasing RMS threshold until less than 90% of the
residues are considered best fitted.
If you leave this check box off, the coordinates in the
database are used.
Because the application will be loading all database molecules into SYBYL, it
will first delete (zap) all molecules already present. A warning message is
issued before anything is done, giving you an option to exit.
The results are reported in a ProTable spreadsheet. Each column represents a

combination of Fitting Options and Segment Length. The molecule used
for generating the spreadsheet is the average of all molecules in the database.
This average molecule is displayed in M1.
You may now use ProTable’s Shaded Tube/Ribbon option to display

colored, variable thickness tubes for the backbone.
• The ProTable Manual
• Find and Fit Fixed Regions on page 223

Find and Fit Fixed Regions
12.4 Find and Fit Fixed Regions

Find a best fit for multiple conformations of biopolymers, particularly those that
are the result of NMR-based structure calculations.
The molecules must be of identical sequence and number of atoms and they
must contain all hydrogens and lone pairs. A minimum of 3 molecules is needed
to perform the fit.

Biopolymer > Compare Structures > Find and Fit Fixed Regions
Molecules to Fit Molecule areas containing the molecules to fit.

All, Clear Tools to help in the selection of molecules.
Target Function • RMS Deviation—Fits the molecules using the
Basis currently defined fixed region during the iteration.
It uses the root-mean-squared deviation as the
function value.
• Distance Variance—Uses the standard deviation of
distances between the root atoms of the residues as
the function. The least square fitting is only
performed after the fixed-region has converged or
after reaching the maximum iteration number.
Fit Threshold Function value of convergence. If the mean function
Value value is lower than this number, the fit is complete.
Fit Conver- If the change of function value between two consecu-
gence Value tive iterations is smaller than this number the fit is com-
plete.

Maximum Itera- Maximum number of iterations to perform.

tions

BIOPOLYMER RESIDUE_FIT mol_expr function
function_value delta_value max_iteration
mol_expr Areas containing the molecules to fit.

function • RMS_FROM_FIT—fits the molecules using the
currently defined fixed region during the iteration.
It uses the root-mean-squared deviation as the
function value.
• DISTANCE_STD—uses the standard deviation of
distances between the root atoms of the residues as
the function. The least square fitting is only
performed after the fixed-region has converged or
after reaching the maximum iteration number.
function_value Function value of convergence. If the mean function
value is lower than this number, the fit is complete.
Default is 1.0Å.
delta_value Minimum change of function value in two consecutive
iterations. If the change of the function value is smaller
than this number in two iterations and the mean func-
tion value is greater than the function_value then the
scaling parameter, u, is reduced to [(0.9)(u)]. Default is
0.1Å.
max_iteration Maximum number of iterations to perform
12.4.3 Fitting Function

This functionality identifies well defined regions of a set of biopolymer struc-
tures and uses the region to fit the molecules. For both RMS deviation or
distance variance methods, the fixed region starts with the entire molecule and
eliminates the residues such that:
Func > Func+ u * sigma(Func)
where:
• Func = RMS deviation or distance variance
• Func = the mean value of the function over all residues in the fixed
region
• u = a scaling constant

• sigma(Func) = the standard deviation of the function over all fixed

residues.
The u is set to 1.5 at the start of the command. If this u value eliminates more
than 3 residues in the first iteration, than u is reset to a value such that it elimi-
nates only 3 residues from the fixed region.
UIMS2 Variables:
SUBST_FIXED contains the fixed region identified by the BIOPOLYMER
RESIDUE_FIT command. The SUBST_FIXED variable is a space separated list
of substructure names identified as in the fixed region of the molecule. Please
note that this variable is not defined until the first execution of the command.
Reference:
“A simple method for delineating well-defined and variable regions in protein
structures determined from inter-proton distance data,” M. Nilges, G.M. Clore
and A.M. Gronenborn, FEBS, 219, 1, 11-16, (1987).
Fit Monomers on page 216 to perform a least squares fit between two
biopolymer sequences of identical lengths

13. Search Protein Database
Search SYBYL’s binary protein database (PRODAT) for key patterns of

residues, sequences, distances, and secondary structural elements.
Biopolymer > Search Database
Binary Protein Location of the binary protein database. By default,

Database to $TA_ROOT/biopolymer/tables/prodat_1009 (see
Search Location and Environment on page 255 to customize
this).
Sequence Access the Sequence Expression dialog to create a
Expression sequence query. Select the query pattern type from the
set of push buttons. Sequence patterns may match
sequences of variable length as well as include wild
cards and multiple choices.
Inter-CA Dis- Access the Inter-CA Distance Expression dialog to cre-
tance Expres- ate an expression for inter-CA distances in a loop of
sion fixed size. Distance expressions always match
sequences of a single fixed length.
Secondary Access the Secondary Structure Expression dialog to
Structure create a secondary structure query. SSE expressions
Expression may match sequences of variable length.
Class Expres- Access the Class Expression dialog to create a class
sion query, a method for finding residues which have the
same residue type, secondary structural state, backbone
angle(s), or sequence number. Class expressions always
match sets of single residues

Operator Select the desired operator (Union, Difference, Inter-

section) from the pull-down. The operator will act on
the next query pattern added to the query expression.
Note that when multiple patterns are entered, the frag-
ment length retrieved will be that of the last pattern.
Undo Last Delete the last pattern generated by one of the second-
Selection ary dialogs
Protein Search The text field shows the current query expression. If the
Query query requires more characters than the field can dis-
play, the end of the expression is shown. This text field
can be edited or entered directly from the keyboard.
Additional information on the format of these expres-
sions may be found in the description of the expression
generator %PDBFILTER()in the SPL Manual.
Clear Clear the text field.
The Protein Data Bank contains a wealth of structural and sequence infor-
mation. SYBYL includes PRODAT, a binary compilation of the better struc-
tures from the PDB, for use in protein loop searches. Search results can be
retrieved as molecular fragments, PDB source information, or ID indices from
the binary database itself.
References:
[1] F.C. Bernstein, T.F. Koetzle, G.J.B. Williams, E.F. Meyer, Jr., M.D.
Brice, J.R. Rodgers, O. Kennard, T. Shimanouchi, and M. Tasumi, J.
Mol. Biol., 1977, 112, 535-42.
[2] E.E. Abola, F. C. Bernstein, S.H. Bryant, T.F. Koetzle, and J. Weng in
Crystallographic Databases – Information Content, Software Systems,
Scientific Applications, eds. F. H. Allen, G. Bergerhoff, and R. Sievers,
Data Commission of the International Union of Crystallographs, Bonn/
Cambridge/Chester, 107-132 (1987).
• Search PRODAT Database for Loops on page 196 for replacing or
building loop fragments in proteins
• Tailor variable PROTEIN_SEARCH to specify preferences
• Loop Search Results in a Spreadsheet on page 201 for column types and
options
• See page 255 for a description of mkprodat and the binary database

Sequence Expression
13.1 Sequence Expression

To automatically create an expression which is used to search for specific
patterns of sequences in the binary protein database.
To access: press the Sequence Expression button on the Protein Database

Searching dialog.
Residue Specifi- Specific sequence patterns can be selected for retrieval

cation by using the residue buttons. As you press each one, it
will appear in the sequence pattern being generated.
Asx Specify either an asparagine or a aspartate at a particu-
lar position in the sequence pattern.
Glx Specify either a glutamine or a glutamate at a particular
position in the sequence pattern.
Any Allow any one residue to match at the current position
in the sequence pattern.
Wild Allow any number of residues to match at the current
position in the sequence pattern.
Open List Start a list of residues that can match the current posi-
tion in the sequence pattern. Follow with a non-empty
set of residues from the push buttons and a Close List.
Close List End the list started by Open List.
Search String The sequence pattern appears in this field as it is cre-
ated using the push buttons above. You may also type
in an expression directly.

Inter-CA Distance Expression
13.2 Inter-CA Distance Expression

To specify inter-CA distance constraints in searching for a loop of fixed length.
The distance constraint applies to the first and last residue of the loop.
To access: press the Inter-CA Distance Expression button on the Protein

Database Searching dialog.
Sequence The fixed length of the loop to which the distance con-
Length straint applies.
Inter-CA Dis- The lower and upper bounds of the distance constraint
tances on the loop.
Residue Offset Type in an integer. Positive and negative offsets are
allowed. That is, an offset of -3 will return the residue
ID of the residue 3 positions toward the N-terminus
from the residue that matches the query. This option is
useful in boolean operations for building complex
expressions.
13.2.1 Secondary Structure Expression

To specify patterns of secondary structural elements (SSEs) to be used while
searching the binary protein database. This is similar to the Sequence Expression
dialog, but searches for matches to the secondary structural state of residues
rather than their identity.
To access: press the Secondary Structure Expression button on the Protein

Database Searching dialog.

Inter-CA Distance Expression
Alpha Helix Specifies that an alpha helix is required at the current

location in the SSE pattern.
Beta Strand Specifies that a beta strand is required at the current
location in the SSE pattern.
Turn Specifies that a turn is required at the current location
in the SSE pattern.
Other Specifies that none of the above (alpha helix, beta
strand, turn) may be present at the current location in
the SSE pattern.
Any Matches any one SSE at the current location in the SSE
pattern.
Wild Matches any number of SSEs at the current location in
the SSE pattern.
Open List Starts a list of allowed SSEs at the current location in
the SSE pattern. Continue with one or more of the SSE
buttons and a Close List.
Close List Ends the list from above.
Search String As you make selections, the actual expression will
appear here.

Class Expression
13.3 Class Expression

To search for classes of residues in the binary protein database. The classes
include residue type, secondary structural type, ϕ and ψ angles, and sequence
number.
Press the Class Expression button on the Protein Database Searching dialog.
Residue Type Select the type of residue from the pull-down.

Residue Sec- Select the type of secondary structure from the pull-
ondary Struc- down.
ture
Residue Back- Select the type of backbone angle.
bone Angle
Dihedral Angles Type in the lower and upper bounds for the backbone
angle specified in the Residue Backbone Torsion
pull-down.
Residue Type in the numerical value of the sequence number
Sequence Num- desired.
ber
Residue Offset Type in an integer. Positive and negative offsets are
allowed. That is, an offset of -3 will return the residue
id of the residue 3 positions toward the N-terminus
from the residue which matches the query. This option
is useful in Boolean operations for building complex
expressions.

Protein Search Results
13.4 Protein Search Results

To access and save the results of the search.
Press the Search button in the Protein Database Searching dialog.
Protein Search The query from the Protein Database Searching dialog
Query which generated these results.
Number of Frag- Numerical field indicating how many fragments in the
ments database match the search query. If the search results
are empty, you will be returned to the main dialog
instead.
List Fragment Press this button if the desired output is one line of text
Source Informa- per fragment which gives the location of the fragment
tion in the corresponding PDB file, including the name of
the source protein.
Retrieve Molecu- Fragments that match the search query will be retrieved
lar Fragments from the binary protein database and displayed in mole-
cule areas.
Fileset Name Enter the base name for the search result files.
Perform RMS Fit Performs a least squares fit on the retrieved fragments
of Fragments (see Fit Monomers on page 216). This option is avail-
able only if all the fragments have the same length. The
RMS fit is done on the alpha carbons of each residue in
the fragment. The RMS fit values are stored in a .rms
file.

Protein Search Results
Create Data- Create a molecular database with each fragment as an

base from Frag- entry.
ments
Create Spread- Create a spreadsheet with each fragment as a row.
sheet from Data- Accessible after the database has been created.
base
Loop Search Results in a Spreadsheet on page 201

14. Biopolymer Dictionary & Database
Administration
• Biopolymer Dictionary on page 236
• Open Dictionary on page 236
• Close Dictionary on page 237
• List Dictionary on page 237
• Manage Custom Dictionary on page 238
• Create or Modify a Monomer on page 241
• Add a Monomer to the Dictionary on page 248
• Create/Update the PRODAT Database on page 255
• Binary Protein Database: PRODAT on page 255
• Customize PRODAT via the Menubar on page 256
• mkprodat Utility on page 258
• pdbfname on page 260

14. Biopolymer Dictionary & Database Administration
Biopolymer Dictionary
14.1 Biopolymer Dictionary

To open a biopolymer dictionary, display the information it contains, add to or
change the definition of a residue or blocking group, or save changes.
• Open Dictionary on page 236
• Close Dictionary on page 237
• List Dictionary on page 237
• Create or Modify a Monomer on page 241
• Add a Monomer to the Dictionary on page 248
• Create or Update a Dictionary on page 248
• Create AMBER SLN Typing Rules on page 249
Only one biopolymer dictionary can be open at a time.
14.1.1 Open Dictionary

Open a biopolymer dictionary.
Biopolymer > Dictionary & Database Admin > Open Dictionary
BIOPOLYMER DICTIONARY OPEN directory_filename
Dictionary files have a file extension of .dic, and reside in the directory
specified by Tailor variable BIOPOLYMER DIRECTORY.
SYBYL will automatically open the default dictionary, specified by the

command Tailor variable BIOPOLYMER DEFAULT_DICT, the first time it needs
information from the dictionary. Thus this functionality is typically used only to
change to a different dictionary.
The dictionaries currently provided are macromol, protein, bigpro, dna, rna,
sugar. The macromol dictionary is opened by default for all biopolymer opera-
tions unless another dictionary has been opened by the user.
Only one biopolymer dictionary can be open at a time. When a new dictionary
is opened, any previously opened dictionary is automatically closed. If the full
filename of the dictionary is the same as the one currently open, no action is
taken.

Biopolymer Dictionary
To build mixed complexes start by opening the macromol dictionary, build the
pieces in separate work areas then join (JOIN) or merge them (Edit > Merge).
The resulting molecule can be written as a .mol2 file. When this file is read
back into SYBYL, the appropriate dictionary (macromol) will be opened
automatically.
14.1.2 Close Dictionary

Close the currently open biopolymer dictionary.
Biopolymer > Dictionary & Database Admin > Close Dictionary
BIOPOLYMER DICTIONARY CLOSE
14.1.3 List Dictionary

Display information about the currently open biopolymer dictionary.
Biopolymer > Dictionary & Database Admin > List Dictionary
BIOPOLYMER DICTIONARY LIST BRIEF/FULL
Build Biopolymer on page 141 to build a biopolymer chain from information in
the dictionary

Manage Custom Dictionary
14.2 Manage Custom Dictionary

Central location for all operations on a user-customized dictionary:
• Copy all the Tripos-provided dictionaries to a user-specified directory.
• Specify the default dictionary
• Create or modify a monomer
• Create AMBER SLN typing rules
• Save a dictionary to a file
Biopolymer > Dictionary & Database Admin > Manage Custom

Dictionary
Current Dictio- Reports the full path of the directory containing the dic-
nary Directory tionary that is currently in memory.
The default location is determined by Tailor variable
BIOPOLYMER DIRECTORY.
Current Dictio- Reports the name of the dictionary that is currently in
nary memory.
The default dictionary is determined by Tailor variable
BIOPOLYMER DEFAULT_DICT.

Dictionary Management
Set Custom Specify the full path of the directory containing the dic-
Dictionary Direc- tionary of interest.
tory This operation:
• changes the location of the dictionary directory and
sets the value of the command Tailor variable
BIOPOLYMER DIRECTORY;
• closes the current dictionary;
• opens the dictionary if one is found in the new
location by the same name as the one that was open;
otherwise, prompts for the name of the dictionary to
open.
Note: A copy of the SYBYL 7.3 version of the dictio-
nary is available in $TA_ROOT/biopolymer/tables/
dictionary_73.
Use Default This operation:
Dictionary Direc- • resets the dictionary directory to the location deter-
tory mined by Tailor variable BIOPOLYMER
DIRECTORY,
• closes the dictionary that was open;
• opens the dictionary if one is found by the same
name as the one that was open; otherwise, prompts
for the name of the dictionary to open.
Create Custom This is the recommended first step before making any
Dictionary Direc- kind of modification to a dictionary.
tory Copy the contents of the current dictionary directory to
the specified path, which must be new or empty. The
Current Dictionary Directory at the top of the dialog
is changed automatically to this new location.
Change Dictio- Select a dictionary among those available in the direc-
nary tory and open it. The Current Dictionary at the top of
the dialog reflects the selection.
Save Dictionary Create a permanent (disk file) dictionary from the tem-
As porary (in-memory) dictionary. The dictionary name
can be either an existing dictionary, or a new one. The
default extension is .dic.
The dictionary will be created in the directory shown at
the top of the dialog.
Note: This operation leaves the original dictionary
open.
The corresponding command is:
BIOPOLYMER DICTIONARY CREATE DICTIONARY
dict_name

Monomer Management
Create New Access the Create Monomer dialog. The new monomer
Monomer must already be present in a molecule area, in the neu-
tral, unblocked form.
Change Exist- Select a monomer already defined in the dictionary in
ing Monomer memory and access the Create Monomer dialog.
Create AMBER Access the Monomer SLN Atom Typing Rules dialog.
SLN Typing You will be prompted to specify the type of monomer
Rules (PROTEIN, NUCLEIC_ACID, or OTHER) then to
select an existing monomer.
Create AMBER Access the Monomer SLN Atom Typing Rules dialog.
SLN Typing The molecule of interest must already be in a molecule
Rules area.
From Molecule This option is useful if the molecule of interest is not a
defined monomer. This would be the case for a ligand.
Preprocessing of this molecule with the tools in the
Biopolymer > Prepare Structure menu is recom-
mended, but not required.
Add Monomer Add the specified monomer to the dictionary currently
File in memory. Duplicate entries are not allowed.
to Dictionary The corresponding command is:
BIOPOLYMER DICTIONARY ADD MONOMER file-
name
The monomer is then immediately available for use
while the current dictionary is open. To include the
monomer permanently in a dictionary use Save Dic-
tionary As in this dialog.
Define a New Blocking Group on page 252

Create or Modify a Monomer
14.3 Create or Modify a Monomer

14.3.1 Define Monomer via the Menubar
Create a new monomer or modify an existing one for a biopolymer dictionary
via a graphical interface, with automated assignment of atom types and charges.
When Creating a New Monomer:

The new monomer must be present in a molecule area, in the neutral, unblocked
form that includes all hydrogens.

Dictionary
In the Custom Dictionary Management dialog press Create New Monomer.
You will be asked to specify:

• the molecule area containing the structure of the new monomer;
• the type of monomer: PROTEIN, NUCLEIC_ACID, CARBOHY-
DRATE, or OTHER;
• an existing monomer in the open dictionary to use as a template when
describing the new one.
When Modifying an Existing Monomer:

Dictionary
In the Custom Dictionary Management dialog press Change Existing
Monomer.
You will be asked to select a monomer from the dictionary in memory. The
structure in the corresponding .res file will be read into the first available
molecule area. Some options in the dialog will be unavailable because they
cannot be changed once the initial .res file has been created.

Molecule Area Work area containing the new monomer.

Basename for Enter the name for the residue file. The extension .res
Monomer File will be appended automatically to this name. Base-
names are case-sensitive. However, all lowercase is
recommended.
Complete Mono- Enter the complete name of the monomer. Chirality
mer Name may be specified, e.g., D-Alanine. This name may not
include any blank characters; use underscores instead.
3-Letter Mne- Enter the three letter code for the new monomer (e.g.,
monic ALA for alanine). Monomer names are not case-sensi-
tive; therefore, duplicate names are not allowed.
1-Letter Code Enter the one-letter code for the new monomer (e.g., A
for Alanine). Enter a period (.) to ignore this code. If
the same 1-letter code is used for 2 or more residues in
a dictionary, the first instance takes precedence.

Monomer Features
Label Label the monomer on the screen with:

• Atom IDs, Atom Names,
• atom types: AMBER7 FF99, AMBER7 FF02,
AMBER4.1 FF95, Kollman All, Kollman
United,
• atomic charges: AMBER7 FF99, AMBER7 FF02,
AMBER4.1 FF95, Kollman All, Kollman
United, Kollman Uni C-Term, Kollman Uni N-
Term,
• None.
Highlight Miss- Highlight the atoms that do not have values for the type
ing Charges/ of atom types or charges specified by Label.
Types
Current Charge Total charge on the monomer for the type of charges
specified by Label.
Atom Names Change the name of selected atoms.
Atom Types Change the atom types of selected atoms.
Recommended procedure:
• Select the type of Label that matches the atom
types you want to modify.
• Toggle the Highlight check box on.
• Press Atom Types.
• Specify the type of atom types: AMBER7_FF02,
AMBER7_FF99, AMBER95_ALL, KOLL_ALL
or KOLL_UNI.
• Designate the atoms to be modified.
• Select an atom type for each atom.
Charges Change the atomic charges of selected atoms.
Recommended procedure:
• Select the type of Label that matches the charges
you want to modify.
• Press Charges.
• Specify the type of atom types: AMBER7_FF02,
AMBER7_FF99, AMBER95_ALL, KOLL_ALL,
KOLL_UNI, KOLL_UNIC or KOLL_UNIN.
• Enter an atomic charge for each atom.
Root Atom Designate the atom in the monomer that will bear the
substructure label. This atom is also used to display the
chain trace. By default, the atom with ID number 1 is
used.

Capping Atoms Designate the atoms to be removed when a connection

is made to an adjacent monomer. A default is provided
for peptidic residues and is stored in the static-set
{CAP_ATOMS}.
Backbone Designate the atoms belonging to the backbone. Atoms
Atoms that match backbone atoms in the template are selected
automatically. You may add to this selection. All atoms
not designated backbone atoms will be considered
sidechain atoms.
Essential Hydro- Designate the hydrogens to be referred as essential.
gens These are generally connected to polar atoms and are
likely to be involved in hydrogen bonds.
Edit Types and A spreadsheet displays the atom ID numbers as rows
Charges in and the various atom types and charges as columns.
Table Unassigned atom types are represented by UNK; miss-
ing atomic charges are represented by 0.000. You may
edit any of these values in the spreadsheet. Press OK in
the accompanying dialog when done.
Class Keywords Choices depend on the type of monomer. See the
description of the residue files for a list of monomer
classes.
Terminating the Monomer Definition
When you press the OK button, the new or modified monomer definition is
stored in the specified .res file.
A new monomer is automatically added to the dictionary in memory. A

Monomer Information dialog will then ask whether to store the monomer in a
permanent (file) dictionary.
• If you press OK, the permanent (file) copy of the dictionary currently
open will be updated.
• If you press Cancel, you can still update the permanent dictionary later
by pressing Save Dictionary As in the Custom Dictionary Management
dialog.
Interactive Spreadsheet
A spreadsheet makes it easy to inspect and edit atom parameters. The rows are
the atom ID numbers as rows and the following are columns:
• NAME—atom name
• A02_T—AMBER7 F02 atom type
• A99_T—AMBER7 FF99 atom type

• A95_T—AMBER4.1 FF95 atom type

• KU_T—Kollman United atom type
• KA_T—Kollman All atom type
• A95_C—AMBER4.1 FF95 atomic charge
• KU_C—Kollman United atomic charge
• KUN_C—Kollman United atomic charge if the monomer is the first in a
chain (N-terminal)
• KUC_C—Kollman United atomic charge if the monomer is the last in a
chain (C-terminal)
• KA_C—Kollman All atomic charge
Complete Verification of the Residue File

The main purpose of the Create Monomer dialog is to facilitate the definition of
new monomers for use in biopolymer modeling. You must verify that all the
information in the new .res file is correct, since some of the properties, copied
from the monomer used as template, are not exposed to the dialog. We delineate
below the special points to note.
1. The positional nicknames of the atoms (e.g., ALPHA, BETA, GAMMA etc.
for the C-alpha, C-beta, and C-gamma atoms, respectively) are copied from
the corresponding definitions in the file of the template monomer. This will
affect the definitions of the conformational angles for the new monomer,
because these angles are defined in terms of the positional nicknames. If, in
addition to the atoms found in the template, there are additional atoms in the
monomer that define conformation states, these must be explicitly defined in
the .res file for the monomer. (To make this point clear, suppose Lys were
to be constructed using Ala as a template monomer. The sidechain confor-
mational angles chi_2 and beyond would be undefined for Lys unless the
C_beta, C_gamma, C_delta, etc., atoms for Lys were defined in the Lys.res
file.)
2. The property definitions for the new monomer are copied directly from the
residue file for the template. These include the molecular weight and the
definition of the improper torsional terms for the AMBER and Kollman
force fields. The assignment of improper torsional terms will be affected.
Description of the Biopolymer Dictionaries on page 278

14.3.2 Define a Monomer via the Command Line

To add a new monomer, or modify an existing monomer in a dictionary. Note
that new residues must be built in the neutral, unblocked form.
BIOPOLYMER DICTIONARY CREATE MONOMER {args}
First build the monomer in a work area by itself, using any SYBYL commands.
You may find it easiest to start from an existing monomer then modify it. Add
all hydrogens to the model and ensure that all atom names are correct. Each
atom in the monomer must have a unique name. You should be consistent about
the way atoms are named across different residues in the same dictionary,
including cap atoms, particularly to allow proper operation of conformational
definitions and blocking groups. Make sure that the dictionary in which this
monomer will be included is open.
Then use the BIOPOLYMER DICTIONARY CREATE MONOMER command. If you

are just making a minor change to an existing monomer, you will be able to take
the default values for most of the following prompts.
mol_area Work area containing the monomer

template_monomer Monomer in the current dictionary to use as a template
for the new monomer. The template monomer is used
to provide default values for subsequent prompts.
name Common name of the new monomer (1 to 3 letters).
Each monomer in a dictionary must have a unique
name.
full_name Full name of the new monomer. There is no restriction
on the length of the full name, but it may not contain
any blanks.
code One letter code for the new monomer. This abbrevia-
tion can be used in sequence specifications. It must be
either alphabetic, or “.” to indicate that no one letter
code will be used for this monomer.
class Choices depend on the type of monomer. See the
description of the residue files for a list of monomer
classes.
root_atom Atom in the monomer to bear the substructure label and
used by chain trace
backbone_atoms Atoms belonging to the backbone. If you have built this
monomer from an existing monomer, consider using
the built-in set {BACKBONE} to indicate the backbone
atoms. All atoms not designated as backbone atoms will
be considered sidechain atoms.

cap_atoms Atoms to remove when a connection is made to an

adjacent monomer
essential_hs Hydrogens to be referred to as ESSENTIAL (generally
those connected to polar atoms and likely to be
involved in hydrogen bonds).
review_alt_types DONT_REVIEW/REVIEW, whether to review and/or
change the alternate atom types stored with the tem-
plate monomer. If not reviewed, the alternate atom
types are copied from the template residues, with any
missing values automatically assigned the type UNK. If
you review, you are prompted for alternate atom types
for each atom in the new monomer. The alternate atom
type for the same atom (if any) in the template mono-
mer appears as the default. Enter the end-loop character
(|) to accept the defaults for the rest of the atoms.
review_charges DONT_REVIEW/REVIEW, whether to review and/or
change the charges stored with the template monomer.
An affirmative answer causes prompts for the charge
for each atom in the new monomer. Enter the end-loop
character (|) to accept the defaults for the rest of the
atoms.
change_nicknames CHANGE/LEAVE_AS_IS, whether to change the posi-
tional nicknames defined for the template monomer.
(Positional nicknames are used in the definition of con-
formational angles.) Before this prompt, all nicknames
defined in the template monomer are listed. If you
choose to change them, you are prompted for the atom
to be nicknamed, and then for the nickname. Nick-
names must begin with the underscore (_) character.
Enter the end-loop character (|) when you are finished
defining nicknames.
copy_properties COPY/DON’T_COPY, whether to copy the monomer
property definitions directly from the template mono-
mer. If not, the new monomer is initialized with no
property definitions.
file name of the file to write. The file is created with a .res
extension in the directory $TA_DICT (specified by Tai-
lor variable BIOPOLYMER DEFAULT_DICT).
A .res file containing the new monomer is created. The new monomer is
automatically added to the dictionary currently in memory and is, therefore,
available for immediate use. If you want this monomer to be permanently
included in the dictionary see Create or Update a Dictionary on page 248.

• Define Monomer via the Menubar on page 241
• See Biopolymer Dictionaries on page 278 for a description of the
biopolymer dictionary file format
Add a Monomer to the Dictionary
Add (or replace) a monomer to the dictionary currently in memory.
Menubar: In the Custom Dictionary Management dialog press

Add Monomer File to Dictionary.
Command: BIOPOLYMER DICTIONARY ADD MONOMER file-
name
Argument: filename—Name of the file containing the new mono-
mer (the extension .res is provided automatically).
The specified monomer is added to the dictionary in memory. This monomer is

then immediately available for use. To include the monomer in a permanent
dictionary see Create or Update a Dictionary.
Create or Update a Dictionary
Create a permanent (disk file) dictionary from the dictionary in memory.
Menubar: In the Custom Dictionary Management dialog press

Save Dictionary As.
Command: BIOPOLYMER DICTIONARY CREATE DICTIONARY
dict_name
Argument: dict_name—Name of the new or existing dictionary
(the extension .dic is provided automatically).
The dictionary will be created in the directory specified by Tailor variable

BIOPOLYMER DIRECTORY.
A new directory is particularly useful for

• saving new monomer types; see Create or Modify a Monomer on page
241
• saving blocking groups; see Define a New Blocking Group on page 252

Create AMBER SLN Typing Rules
14.4 Create AMBER SLN Typing Rules

Inspect and edit the SLN atom typing and charge definitions for the mid-chain
and the neutral and charged termini versions of a monomer in any of the
supported AMBER force fields.
For an Existing Monomer:

Dictionary
In the Custom Dictionary Management dialog press Create AMBER SLN
Typing Rules.
You will be asked to select a monomer from the dictionary in memory.
The structure in the corresponding .res file will be read into the first available
molecule area. The attachment atoms will be shown in magenta. Their atom
types and charges cannot be changed.
For a Molecule:
A molecule must be present in a molecule area.
This option is useful if the molecule of interest is not a defined monomer. This
would be the case for a ligand. Preprocessing of this molecule with the tools in
the Biopolymer > Prepare Structure menu is recommended, but not
required.

Dictionary
In the Custom Dictionary Management dialog press Create AMBER SLN
Typing Rule.

Monomer The name of the selected monomer or molecule.

Force Field One of the following force fields: AMBER7 FF99,
AMBER7 FF02, AMBER4.1 FF95, Kollman All.
Inspecting and Editing the Rules
Show/Modify Note: This functionality is available only if the mono-

mer was taken from a monomer file in the dictionary.
Select the definition of the monomer in one of the fol-
lowing contexts (choices depend on the monomer type:
PROTEIN or NUCLEIC_ACID):
• Mid-chain
• N-Terminus Charged
• N-Terminus Neutral
• C-Terminus Charged
• C-Terminus Neutral
• 5’-Terminus OH
Label Label the monomer on the screen with: Atom Names,
Atom Types, Charges, or None.
Highlight Miss- Highlight the atoms that do not have values for the type
ing Charges/ of atom types or charges specified by Label.
Types
Current Mono- Total charge on the monomer for the type of charges
mer Charge specified by Force Field and position in the chain
(Show/Modify).

Change Atom Change the atom types of selected atoms.

Types Recommended procedure:
• Select the desired Force Field and Show/Modify
context.
• Set the Label to display Atom Types.
• Press Change Atom Types.
• Select an atom type for each atom.
Change Atom Change the atomic charges of selected atoms.
Charges Recommended procedure:
• Select the desired Force Field and Show/Modify
context.
• Set the Label to display Charges.
• Press Change Atom Charges.
• Enter an atomic charge for each atom.
Saving the Rules
Rules To Save Note: This functionality is available only if the mono-

mer was taken from a monomer file in the dictionary.
The check boxes become accessible after the selection
of the corresponding option in the Show/Modify menu
in this dialog (choices depend on the monomer type:
PROTEIN or NUCLEIC_ACID).
• Mid-chain
• N-Terminus Charged
• N-Terminus Neutral
• C-Terminus Charged
• C-Terminus Neutral
You may then select one or more sets of rules to be
saved in the .grp file associated with the type of Force
Field selected at the top of the dialog.
Note: New SLN rules are added at the end of the .grp
file. If the file contains duplicate entries, the first one
takes precedence.
Save Monomer Save the requested typing rules for the type of Force
... Typing Rules Field selected at the top of the dialog.

Define a New Blocking Group
14.5 Define a New Blocking Group

Add a new blocking group, or modify an existing blocking group in a
biopolymer dictionary.
First build the blocking group in a molecule area by itself, using any SYBYL
commands. Add all hydrogens to the molecule and ensure that all atom names
are correct. Each atom in the blocking group must have a unique name.
BIOPOLYMER DICTIONARY CREATE BLOCK {args}
If you are just making a minor change to an existing blocking group, you will
be able to take the default values for most of the following prompts:
mol_area Molecule area containing the blocking group.

template_block Blocking group in the current dictionary to use as a
template for the new block. The template block is used
to provide default values for subsequent prompts.
name Common name of the new group (1 to 3 letters). Each
monomer and block in a dictionary must have a unique
name.
full_name Full name of the new group. There is no restriction on
the length of the full name, but it may not contain any
blanks.
root_atom Atom in the blocking group to bear the substructure
label and used by chain trace
backbone_atoms Atoms belonging to the backbone. If you have built this
blocking group from an existing one, consider using the
built-in set BACKBONE to indicate the backbone atoms.
All atoms not designated as backbone atoms are consid-
ered sidechain atoms.
essential_hs Hydrogens to be referred to as ESSENTIAL (generally
those connected to polar atoms and likely to be
involved in hydrogen bonds).
block_conn_atom Blocking group atom to connect with a monomer.
block_discard_ato Blocking group atom, bonded to block_conn_atom, to
m be thrown away when group is attached to a biopolymer
chain
monomer_conn_at Atom on a non-block monomer to attach to
om block_conn_atom

monomer_discard_ Atom on a non-block monomer, bonded to

atom monomer_conn_atom, indicating which part of the non-
block monomer is to be thrown away when adding the
blocking group. All atoms on the path starting at
residue_conn_atom and continuing through
residue_discard_atom are thrown away.
review_alt_types DONT_REVIEW/REVIEW, whether to review and/or
change the alternate atom types stored with the tem-
plate block. If not reviewed, the alternate atom types
are copied from the template block, with any missing
values automatically assigned the type UNK. If you
review, you are prompted for alternate atom types for
each atom in the new block. The alternate atom type for
the same atom (if any) in the template block appear as
the default. Enter the end-loop character (|) to take the
defaults for the rest of the atoms.
review_charges DONT_REVIEW/REVIEW, whether to review and/or
change the charges stored with the template block. An
affirmative answer causes prompts for the charge for
each atom in the new block.
change_nicknames CHANGE/LEAVE_AS_IS, whether to change the posi-
tional nicknames defined for the template block. (Posi-
tional nicknames are used in the definition of
conformational angles.) Before this prompt, all nick-
names defined in the template block are listed. If you
choose to change them, you are prompted for the atom
to be nicknamed, and then for the nickname. Nick-
names must begin with the underscore (_) character.
Enter the end-loop character (|) when you are finished
defining nicknames.
file Name of the file to write. The file is created with a .res
extension in the directory $TA_DICT (specified by the
command Tailor variable BIOPOLYMER
DEFAULT_DICT).
The new blocking group is automatically added to the dictionary currently in

memory and is, therefore, available for immediate use. A .res file containing
the new block is created. If you want this block to be included in a permanent
dictionary, see Create or Update a Dictionary on page 248.
• See Biopolymer Dictionaries on page 278 for a description of the
biopolymer dictionary file format

• Blocking groups can be added automatically by any operation adding or

changing residues at the end of a chain. To add a blocking group
manually see Edit Terminal Residues via the Command Line on page
110.

Create/Update the PRODAT Database
14.6 Create/Update the PRODAT Database

14.6.1 Binary Protein Database: PRODAT
Tripos supplies a binary protein database, PRODAT, containing high-resolution
protein structures derived from the Protein Databank. SYBYL uses this binary
database:
• for ad-hoc database searching
• for protein loop searching
• for constructing protein backbone from alpha trace
Tripos updates PRODAT periodically to keep it in line with the version of

HOMSTRAD that is distributed with SYBYL releases. Old PRODAT entries
are included except for those that have been superseded by more recent entries.
A list of the structures used to build the Tripos-supplied binary protein database
can be found in $TA_PDBTABLES/codeset (by default, $TA_ROOT/
biopolymer/tables/prodat/codeset).
Location and Environment
The SYBYL software looks for PRODAT in the directory pointed to by the
environment variable TA_PDBTABLES. By default, this variable is set in
$TA_ROOT/lib/environment to $TA_ROOT/biopolymer/tables/prodat.
This environment variable is used to initialize Tailor variable

PROTEIN_SEARCH PROTEIN_DIR (default TA_PDBTABLES) and Tailor variable
PROTEIN_SEARCH PDBCODE_SET, which contains the path to the codeset file
generated by mkprodat. By default it is set to $TA_PDBTABLES:codeset.
Recommended Criteria for Customizing PRODAT
The following criteria should be considered before adding a protein to your

personal copy of PRODAT:
1. Include only structures with a resolution better than 3 Å.
2. Do not include structures containing only Cα coordinates.
3. Do not include structures in which backbone atoms are missing.
4. Do not include structures for which there are missing stretches of atoms
(for example, in disordered regions).
5. Remove covalent bonds to the ligand in the crystal structure.
6. Only the 20 standard amino acids are allowed.

7. Include only proteins with more than 30 residues.

8. The number of residues with missing sidechain coordinates should be
less than 10%.
9. No two proteins should have more than 45% sequence identity.
Ligands, cofactors, metals, and water molecules are discarded from the selected
entries.
14.6.2 Customize PRODAT via the Menubar

To create your own binary database or add structures to an existing database
from standard PDB coordinate files.
Warning: We recommend that you modify a copy of the original binary protein
database then reset your local environment to point to the copy (see Location
and Environment).
Biopolymer > Dictionary & Database Admin > Create/Update PRODAT

Database
Create a new Clicking on this button is followed by a request for the

database location of the database. The default is to create a new
file named PRODAT in your current directory.
Add to existing Access a file browser and locate the existing database.
database

List of PDB Files Taken From
A File to be Cre- Proceed as follows:

ated • Enter the name of the file that will contain the
complete names of the PDB files to be added to the
binary database; press Continue in the
MKPRODAT dialog.
• Use a file browser to select a PDB file; press OK to
add the selected file to the list.
• A Protein Database File dialog lists the current
name of the protein; you may change it if you want
or add a comment. The Include in database
searches check box is on by default. On means
that this file will be included in protein searches,
whereas off means that it will be excluded from
such searches. Press OK to add the selected file
name to the list.
• The file browser is posted again for selection of
additional PDB files. When you are done, press
Cancel to end the list.
• In the Decision dialog, you can:
- Create the database with the files specified, then
quit.
- Create only the file of PDB file names then quit.
- Cancel.
An Existing File Access a file browser and locate the file containing the
list of PDB file names.
Comments about the file of file names if you create it manually:

The names of the PDB files in the file containing the file names must include
the full path and file extension. (The 4-character PDB code is never a valid
option as a file name unless there is a PDB file in your current directory with
only that code as a file name and the extension .ent.Z, .ent., or .gz)
mkprodat will automatically see files of the form pdbABCD.SUFFIX where

ABCD is the 4 character PDB code and SUFFIX is .gz, .ent, or .ent.Z. If the
files in your local TA_PDB are not setup like the CD release from RCSB, you
need to modify $TA_ROOT/bin/unix/pdbfname to match their directory
structure. If you want to use files in your local directory, you must provide the
complete path in the file of filenames.
• mkprodat Utility on page 258
• Read about the file of file names

14.6.3 mkprodat Utility

Use the mkprodat utility outside of the SYBYL application to create your own
binary database or add structures to an existing database from standard PDB
coordinate files.
Warning: We recommend that you modify a copy of the original binary protein
database then reset your local environment to point to the copy (see Location
and Environment).
Defining the Environment:
Refer to SYBYL-X Environment Shell in the SYBYL Basics Manual.
Syntax:
$TA_BIN/mkprodat [-a] file_of_filenames \

source_directory target_directory
-a Optional argument. It causes mkprodat to append to

the end of the database in target_directory, rather
than creating a new database.
file_of_filenames File containing a list of the PDB files to be processed.
Each line in this file contains the name of one PDB file.
Following the PDB file may be an optional “+” or “-”,
to indicate that this protein should, by default, be
included (+) or excluded (-) from protein and loop
searches. The rest of the line can be used for comments.
The file names specified in this file need not be com-
plete file specifications; typically, they will be only the
4-character PDB code. You can use the
source_directory argument or pdbfname command
to tell mkprodat how to locate the actual PDB files on
your system. The file_of_filenames used to build
the Tripos-supplied binary database is in the file
$TA_PDBTABLES/codeset (by default, $TA_ROOT/
biopolymer/tables/prodat/codeset).
source_directory Directory where the PDB files reside. If you are using
the pdbfname command (recommended — see
Remarks below) to specify where the PDB files reside
on your system, source_directory is not important, and
you can enter a value of “.” (for current directory).
target_directory Directory to create or update binary protein database.
mkprodat can read standard as well as compressed PDB files.

mkprodat looks for PDB files in two places:

• First, the source_directory for a file with the exact name given in
the file_of_filenames.
• If it can’t find the file there, it runs the shell script pdbfname, which
translates from a 4-character PDB code to a full file specification (see
pdbfname below).
If a PDB file contains multiple alternative conformations for a protein, as is

sometimes the case with NMR-derived structures, mkprodat uses the first
model in the file.
mkprodat determines the PDB code for a PDB file in one of 2 ways. First, it
looks in columns 63-66 of the HEADER record of the file. If these columns are
blank or if there is no HEADER record, mkprodat uses the file name given in
file_of_filenames, if it is 4 characters long. If both of these methods fail,
mkprodat assigns an arbitrary PDB code and issues a warning message.
mkprodat creates many binary data files, plus the text file codeset. The
codeset file contains the name of each molecule in the database, with a “+” or
“-” to indicate whether it will be used or ignored in database and loop searches.
You can feel free to modify or copy this file to control which proteins are used
in searches. Tailor variable PROTEIN_SEARCH PDBCODE_SET controls which
codeset file is used in SYBYL.
mkprodat uses the standard Kabsch-Sander method of classifying secondary

structure based on hydrogen-bonding patterns [Ref. 42].
Note: If the HEADER line in the PDB file is shorter than 66 characters and not
padded on the right side with blanks, mkprodat will not produce a PDB
refcode when adding the protein to the binary database. You may choose either
of the following solutions:
• Edit the PDB file and insert the desired PDB code in columns 63-66 of
the HEADER line.
• Change the file of file names used by mkprodat to specify 4-character
file names and change the script $TA_ROOT/bin/unix/pdbfname to
locate the exact file from the 4-character code.
Warning: mkprodat cannot be used to delete a entry in the binary protein

database. To exclude a protein from a database or protein loop search use the
“-” option. To delete an entry from the database permanently, you must rebuild
the database without that protein.
• Customize PRODAT via the Menubar on page 256

• Read about the file of file names
14.6.4 pdbfname
If a local copy of the Protein Data Bank is maintained at your site ask your
systems administrator to set $TA_PDB in the $TA_ROOT/lib/environment
file within your SYBYL-X installation to point to it. Refer to Environment
Variables in your SYBYL Installation in the SYBYL Administration Manual.
The script $TA_ROOT/bin/unix/pdbfname translates a 4-character PDB

code to a file name of the form pdbxxxx.ent.Z or pdbxxxx.gz in directory
$TA_PDB at your site.
Within SYBYL, this script is accessed by mkprodat. You may also use it to
read a PDB file via a command.
Because the structure of the $TA_PDB directory depends on how it was

assembled at your site (from distribution media or by download), this script may
not work as provided.
For example, to retrieve the PDB file with the code name of 1crn from your
site’s copy of the PDB database, enter the command.
PDB IN m1 %system("pdbfname 1crn")
If pdbfname is working correctly with your $TA_PDB directory structure, you

will see a result similar to:
$TA_PDB/c/pdb1crn.ent.Z or $TA_PDB/CD_1/cr/pdb1crn.gz
If you get the result:

1crn
your $TA_PDB directory structure is not recognized by the pdbfname script.
You will need to edit this shell script ($TA_ROOT/bin/unix/pdbfname) to

look in the TA_PDB directory structure found at your site. If you edit a copy of
the script, make sure that you put it in a directory that comes earlier in your
PATH than $TA_ROOT/bin/unix.

15. The Sequence Viewer
The Sequence Viewer is an interactive, cross-platform, fully integrated protein

sequence data visualization and manipulation tool for protein modeling. The
viewer allows researchers to read in a series of sequences in .pir or .fasta
formats, or multiple sequence alignment, manually edit sequences or perform an
automated alignment and even adjust the font size. Researchers may also
annotate and color sequences based on a number of algorithms including
ClustalX, Zappo, Taylor, DSSP, and by predicted secondary structure. In
addition, relevant calculations can be performed within the Sequence Viewer
such as determining a consensus sequence, determining similarity and identity,
and adding JOY annotation.
Protein structures may be read in through the Sequence Viewer and displayed on
the SYBYL screen which can then be annotated and colored by manipulating
the corresponding sequence in the viewer. Three dimensional properties such as
solvent accessibility, % accessibility, residues involved in a certain type of
H-bonding, mapping phi and psi information, among others can all be
performed within the viewer.
For advanced protein modeling using ORCHESTRAR, the Sequence Viewer is

an integral part of the modeling workflow. It is used in concert with the
ORCHESTRAR graphical interface to manipulate homolog sequences and
structures as well as interacting with a homology model and the target sequence.
Refer to the ORCHESTRAR Manual for details.
In this chapter:
• Description of the Sequence Viewer on page 262
• Mouse and Keyboard Interactions in the Sequence Viewer on page 275

Description of the Sequence Viewer
15.1 Description of the Sequence Viewer

The Sequence Viewer is an interactive tool for protein sequence visualization
and manipulation.
15.1.1 Accessing the Sequence Viewer

The Sequence Viewer may be used in two modes:
Biopolymer mode:
Biopolymer > Sequence Viewer
Or
Click on the Biopolymer toolbar.
Or
Open a sequence file via File > Import File ( )
ORCHESTRAR mode:
It is posted automatically by the following ORCHESTRAR dialogs:
• Model Conserved Regions
• Analyze Conserved Regions
• Search Loops
• Add/Analyze Loops
• Model Sidechains
• Analyze Sidechains
• Analyze Model

15.1.2 Sequence Viewer Panels
Left Panel Each sequence is identified by name. The molecule area

(if there is a matching structure) and the number of res-
idues are also provided.
• The gray bar can be used to select all sequences or
clear the current selection.
• Query, model and consensus lines (requested via the
View menu) are always in the top section. Order
depends on creation and may be modified by use of
the middle mouse button.
• A horizontal line separates the top section from the
parent homologs.
• The parent homologs are listed in the sort order
used to build the SCRs. This order may be modified
may be modified by use of the middle mouse
button. Bold entries identify structures displayed via
one of the ORCHESTRAR dialogs.
Right Panel Full sequences appear, one per line. Modified amino
acids that do not match any residue in the dictionary
appear as “X” in the sequence.
A ruler helps visualize and manipulate the alignment.
The homolog with the lowest numbered residue is set to
position 1 in the ruler.
See Mouse and Keyboard Interactions in the Sequence
Viewer on page 275.
Selection The information box at the bottom left echoes the posi-
tion of the cursor in the Sequence Viewer.
• If the cursor is in the left panel: the name of the
sequence.
• If the cursor is in the right panel: the name of the
sequence, the ruler position, and the residue label or
“gap” if there is no residue in that position.

Score The information box at the bottom center reports the

homology score for the alignment using the
BLOSUM62 homology matrix.
• Residue homology score—The substitution proba-
bility score for the selected residue (higher numbers
indicate favorable substitutions).
• Sequence score—The sum of the substitution proba-
bility scores for the entire sequence. However, if a
single column is selected, the number reported is
the sum of scores for the residues at that position in
all the sequences.
• Alignment score—The sum of the homology scores
for the homolog sequences. The model sequence, if
available, is ignored in the calculation.
Color Molecule Apply to the selected molecule(s) the coloring scheme
of the text background. See Color Schemes for the Text
Background in the Sequence Viewer on page 270.
Color Displays, in a separate dialog, a list of color schemes
that can be applied to the textual representation of the
sequences in the right panel.
See Color Schemes for the Text Background in the
Sequence Viewer on page 270.
Size Use the [-] and [+] buttons to adjust the size of the text
for more comfortable viewing.
15.1.3 Sequence Viewer Menubar

File Menu
Load/Add Read one or more sequence(s) from a file. Valid file

Sequence(s) types are FUGUE .ali files, as well as .fasta, .pir,
.msa and .msf files. Sequence alignments within a
multiple sequence file are maintained.
Accessible only in Biopolymer mode.
Load From Mol- Capture sequence information from a protein already
ecule(s) present on the SYBYL screen. Any gap in the sequence
is represented by a single character (-) in the viewer
regardless of how many residues are missing.

Associate Mole- Attempt to establish missing connections between

cules sequence and structure. This is useful when the
sequence alignment file loaded into the Viewer was
generated outside of SYBYL.
Example of Use: perform a sequence alignment with
ClustalW and load it into the Sequence Viewer, then
display the corresponding structures in SYBYL from
PDB files. Because the sequences in the Viewer are not
associated with the structures, it is not possible to deter-
mine the effect of the sequence alignment on the struc-
tural alignment. Associate Molecules establishes the
missing connections by matching the names of unasso-
ciated sequences with the names of the molecules in the
molecules areas. It also resets the stack of previous
alignments accessible via the Sequence Viewer’s Edit >
Undo so that sequence alignment changes made prior
to the association with a structure can not be undone.
Load JOY Tem- Load a JOY template from a .tem file (see the JOY
plate manual: http://tardis.nibio.go.jp/joy/). Sequences
present in the JOY template must correspond to those
present in the Sequence Viewer.
Save Selected Select one or more sequences or a block of residues,
As then save the current selection (or all sequences if noth-
ing is selected) in one of the following formats. You
will be prompted for a file name.
• PIR File
• FASTA File
• MSF File
• HTML File—The output reflects the current text
style and color scheme as well as the HTML prefer-
ences set via Options > Preferences. The output
does not include graphical items that cannot be
reproduced in HTML, such as selection markings.
Use View > Open in Browser to preview the
output.
Close Close the Sequence Viewer.
Edit Menu
Accessible if the Sequence Viewer is used in Biopolymer mode or if it is posted
by ORCHESTRAR’s Model Conserved Regions dialog.

Undo Sequence Restore the most recent sequence alignment from the
Change(s) backup stack. The stack contains up to 20 sequence
alignments. Each new alignment, created via the mouse
or an alignment function, is automatically added to the
stack. The original alignment is preserved as the oldest
item in the stack unless a File > Associate Molecule
was performed, which resets the stack.
Remove Delete the selected sequence(s) from the Sequence
Selected Viewer. However, any corresponding structure(s) will
Sequence(s) remain on the SYBYL screen.
Note: It is recommended that the remaining sequences
be realigned using Align Selected Sequences on
this menu.
Remove All Delete all sequences from the Sequence Viewer,
whether selected or not.
Remove All Remove all gaps from all sequences.
Gaps
Remove Empty If a gap occurs at the same ruler position (a column) in
Columns all sequences, the gap is removed from all sequences,
and the sequences shifted to the left
Align Selected Align the selected sequences using the Needleman and
Sequences Wunsch algorithm [Ref. 28]. If no sequences are
selected, all are used.
Note: The structures corresponding to the realigned
sequences are not affected by this operation.
Align Selected Align the structures associated with the selected (or all)
Structures sequences (whole or partial) based on the current
sequence alignment. The structural alignment is per-
formed on C-Alpha atoms.
Note: This functionality is not accessible from within
ORCHESTRAR.

View Menu
Text Style A radio button in the side menu identifies the style
applied to the text of the model and homolog
sequences:
• Plain Sequence—Back text.
• Color by Secondary Structure—Helices in red,
sheets in blue and turns in magenta. This is the
initial default style, as set by Tailor variable
BIOPOLYMER SEQUENCE_VIEWER
INITIAL_TEXT_STYLE.
• JOY Annotation—See JOY Annotation Key on
page 274.
Text Back- Select the color scheme to be applied to textual repre-
ground sentation of the sequences in the right panel.
See Color Schemes for the Text Background in the
Sequence Viewer on page 270.
Text Size Size of the text in the dialog: Tiny, Small, Normal,
Medium, Large, or Huge.
Consensus The consensus sequence is updated on the fly using all
Sequence homolog sequences as listed in the Sequence Viewer.
The number of identical residues (ni) is counted and
compared to the number of sequences (ns). For a given
ruler position, the consensus residue character is:
• Bold: ni/ns > 70%
• Normal: 70% > ni/ns > 35%
• Blank: ni/ns <= 35%
Consensus JOY Display a consensus JOY secondary structure sequence,
SS which is updated on the fly using all homolog
sequences. Prerequisite is to have read in a JOY tem-
plate file (via the File menu). The secondary structure
element is shown if it occurs in more than 70% of the
homolog sequences.
• a = alpha helix
• b = beta sheet
• 3 = 3/10 helix

Consensus JOY Display a consensus JOY environment sequence, which

Env is updated on the fly using all homolog sequences. Pre-
requisite is to have read in a JOY template file (via the
File menu). See the JOY Annotation Key on page 274.
• If a JOY annotation is conserved >70% over the
homologs residues, it is applied to the consensus
sequence character.
• If there is no consensus sequence character, the
character which is most often in the homologs at
this position is used.
Color Molecule Apply to the selected molecule(s) the coloring scheme
of the text background. See Color Schemes for the Text
Background in the Sequence Viewer on page 270.
Open in Browser Show the current selection (or all sequences if nothing
is selected) in an HTML browser. The output reflects
the current text style and color scheme as well as the
HTML preferences set via Options > Preferences.
The output does not include graphical items that can-
not be reproduced in HTML, such as selection mark-
ings.
The output is stored in the file sequence.html in the
current working directory. To save the HTML output to
permanent location use File > Save Selected As >
HTML File.
Selection Menu
Invert Selection Invert the current selection of whole or partial

sequences in the Sequence Viewer.
Search Search all sequences for a particular character string.
Sequences Case insensitive.
Pick Residue(s) Click one or more residue(s) in one or more structure(s)
from the SYBYL screen. The selection will be high-
lighted in the SYBYL window and in the Sequence
Viewer. Click an already highlighted residue to unselect
it.
You can perform the same selection/highlight combina-
tion by holding the Shift key while you click residues.
Select Resi- Access the Select Substructures dialog to select one or
due(s) more residue(s) in a single molecule. The selection will
be highlighted on the screen and in the Sequence
Viewer.

Color Color the background text of the sequence selection and

the associated structures in the SYBYL screen. The
color choices are taken from SYBYL's original set of
non-white colors. To remove the coloring, apply None.
Accessible when a selection has been made (see Select-
ing in the Sequence Viewer on page 275).
Label Substructure of None.
Display Display all the atoms in the structure(s) matching the
current selection.
Undisplay Remove from the SYBYL screen all the atoms in the
structure(s) matching the current selection.
Options Menu
Preferences Access the Sequence Viewer Options dialog where you

can specify:
• Initial Text Style—The style applied to the text in
the Sequence Viewer the first time it is posted:
COLOR_BY_SECSTR (helices in red, sheets in blue
and turns in magenta) or JOY_ANNOTATION (see
JOY Annotation Key on page 274) or PLAIN (black
text).
• HTML Output Width—The number of sequence
characters per line when viewing the sequence
alignment in a browser or saving it in HTML
format.
• Write HTML Output Key—Whether to include it
when viewing the sequence alignment in a browser
or saving it in HTML format.
These options correspond to Tailor variables BIOPOLY-
MER SEQUENCE_VIEWER.
Help Menu
On Sequence This documentation page.

Viewer

On JOY Annota- See JOY Annotation Key on page 274.

tion
On Mouse Inter- See Mouse and Keyboard Interactions in the Sequence
actions Viewer on page 275.
15.1.4 Color Schemes for the Text Background in the Sequence

Viewer
Color schemes that can be applied to the textual representation of the sequences
in the Viewer’s right panel. Only one scheme may be applied at a time.
Access:
• At the bottom right of the Sequence Viewer click Color to post the Set
Color dialog.
• In the Sequence Viewer’s View menu, select Text Background.
You may then apply the color scheme to the corresponding molecule on the
SYBYL screen:
• At the bottom right of the Sequence Viewer click Color Molecule.
• In the Sequence Viewer’s View menu, select Color Molecule.
None Remove any color applied to the text background.

Custom Applies the colors that were set via the Sequence
Viewer’s Selection > Color option.
Molecule Apply the color of the residues as they appear on the
structure on the SYBYL screen to the text background
of the residues in the Sequence Viewer.
Note that if you change the molecule’s color scheme,
you will have to select this option against to update the
Sequence Viewer window.
ClustalX • Orange: GLY, PRO, SER, THR
• Red: HIS, LYS, ARG
• Green: PHE, TRP, TYR
• Blue: ILE, LEU, MET, VAL
Zappo Zappo: http://www.lii-enac.fr/~letondal/biok/alig.html
• Salmon: ALA, ILE, LEU, MET, VAL
• Orange: PHE, TRP, TYR
• Red: ASP, GLU, HIS
• Green: ASN, GLN, SER, THR
• Blue: ARG, LYS
• Magenta: GLY, PRO
• Yellow: CYS

Taylor One color per type of amino acid.

See: Taylor, W. R. “Identification of protein sequence
homology by consensus template alignment.” J. Mol.
Biol. 1986, 188, 233-258.
Identity Whether a residue at a particular ruler position in the
viewer has the same amino acid type in all sequences.
• Blue: all residues are identical
• Red: at least one residue is different from the others
Similarity This color scheme uses the consensus sequence as ref-
erence. At a given ruler position, the following color
key applies:
• Blue: The sequence character is identical to the
consensus sequence character.
• Light Blue: The sequence character has a positive
homology score with the consensus sequence
character. The homology score is calculated with
the BLOSUM62 matrix.
• No color: There is no consensus sequence character
or the homology score <= 0.
Local RMSD The RMSD between each C-alpha and the mean posi-
tion of all corresponding C-alpha in the selected (or all)
sequences. The color code is as follows:
• Blue: <1 Å
• Green: >=1 Å, but <2 Å
• Yellow: >=2 Å, but <3 Å
• Orange: >=3 Å, but <4 Å
• Red: >=4 Å
Non-Fav Mut Colors all P, G, and C residues white then compares
sequences for mutations and colors them as follows:
• Magenta: PRO to non-PRO
• Green: GLY to non-GLY
• Yellow: CYS to non-CYS
Homolog sequences and their consensus sequence (if
shown) are compared to the query, and the query itself
is compared to the consensus sequence. In the absence
of a query, the consensus sequence is the reference.
Hydrophobicity Uses the SYBYL built-in sets {HYDROPHOBIC} and
{POLAR}.
• Red-orange: hydrophobic residues
• Blue: polar residues
The color options below require a JOY template to be read in (via the File
menu).

DSSP Dictionary of Secondary Structure Prediction [Ref. 42]

• Blue: beta-strand
• Red: alpha helix
• Magenta: 3-helix (3/10 helix)
SecStr & Phi Secondary structure and phi
• Red: helix (alpha and 3/10)
• Blue: beta-strand
• Orange: positive phi angle
Positive Phi Orange: residues with a positive phi angle.
Cis Peptide Green: residues connected by a cis peptide bond.
Disulfide Yellow: residues involved in disulfide bonds.
Carbonyl HB Red: residues with a sidechain carbonyl involved in
hydrogen bonds.
Amide HB Blue: residues with a sidechain amide involved in
hydrogen bonds.
BB Carbonyl HB Red: residues with a backbone carbonyl involved in a
hydrogen bond.
BB Amide HB Blue: residues with a backbone amide involved in a
hydrogen bond.
Sidechain HB Magenta: residues with sidechains atoms involved in
hydrogen bonds.
Heterogen HB Magenta: Any residue involved in a hydrogen bond
with a molecule comprised of HETATM records
(ligand, cofactor, water, metal).
Heterogen Bond Green: Any residue involved in a bond with a molecule
comprised of HETATM records (ligand, cofactor,
water, metal).
Sov. Access Cyan: Residues accessible to solvent.

%Access Solvent accessibility as described by Mizugushi et al.,

“Joy: protein sequence-structure representation and
analysis” Bioinformatics 1998, 14, 617-623.
• Green: 0-12%
• Green-blue: 13-25%
• Cyan: 26-37%
• Blue: 38-50%
• Purple: 51-63%
• Magenta: 64-75%
• Violet: 76-87%
• Red: 88-100%
Ooi Nr The Ooi Coordination Number is a count of the number
of other C-alpha atoms within a radius (here 14 Å) of a
given residue’s own C-alpha. Although crude, this
number gives a good impression of which parts of the
structure are buried and which are exposed.
• Green: <=1
• Green-blue: 2
• Cyan: 3
• Blue: 4
• Purple: 5
• Magenta: 6
• Violet: 7
• Red: 8
• Orange: 9
• Yellow >=10
Nishikawa, K. and Ooi, T. “Radial locations of amino-
acid residues in a globular protein - correlation with the
sequence.” J. Biochem. 1986, 100, 1043-1047.

15.1.5 JOY Annotation Key

JOY annotates protein sequence alignments with three-dimensional structural
features.
SeqViewer: View > Text Style > JOY Annotation
The following table is taken from the online JOY manual: (http://www-
cryst.bioc.cam.ac.uk/joy/joyman.htm).
alpha helix red

beta strand blue
310 helix maroon
solvent accessible lower case (x)
solvent inaccessible UPPER CASE (X)
hydrogen bond to main chain amide bold (x)
hydrogen bond to main chain carbonyl underline (x)
hydrogen bond to sidechain overline (x)
disulfide bond cedilla (ç)
positive phi angle italic (x)

Mouse and Keyboard Interactions in the Sequence Viewer
15.2 Mouse and Keyboard Interactions in the

Sequence Viewer
15.2.1 Selecting in the Sequence Viewer
All selection operations involve the left mouse button.
• A single click selects a single item. The selection is also echoed in the
Selection information box at the bottom of the Sequence Viewer.
• Click any line in the left panel to select the whole sequence.
• Click any column in the ruler to select the residue at that ruler
position in all the sequences.
• Click any residue in the right panel to select only that residue.
• Click and drag the left mouse button to select a range of rows
(sequences), a range of columns (ruler positions), or a block of residues.
• Click followed by Shift + left-click has the same effect as click and drag.
• Ctrl + left-click adds to or removes from the selection
Mac users: Command + left-click
• Use the buttons to select all, invert the selection, and clear
the selection, respectively.
15.2.2 Changing the Order of Lines in the Sequence Viewer

Use the icon to change the order of sequence lines in the top and bottom
sections of the Sequence Viewer.
15.2.3 Editing in the Sequence Viewer

The operations described in this section are possible only if the Sequence Viewer
is used in Biopolymer mode or if it is posted by ORCHESTRAR’s Model
Conserved Regions dialog.
Inserting Gaps
Use the keyboard “-” key to insert one or more gap(s):

• Before a single selected residue
• Before all residues in a whole or partially selected column
Deleting Gaps
Use the keyboard Delete or Backspace to delete one of more gap(s)

Mouse and Keyboard Interactions in the Sequence Viewer
• In a single sequence
• In multiple sequences, but only to the extent that gap(s) can be deleted in
any of the individual sequences.
Moving Residues
Use the middle mouse button to move one or more residue(s). This may be done
in one of two modes.
Smart mode: click and drag the middle button.

• This mode moves the rest of the sequence along with the moving
residue.
• When closing a gap, movement stops when the gap is closed.
• When creating a gap, residues may be pushed beyond the ruler’s extreme
positions.
Abacus mode: Ctrl (Command on the Mac) + click and drag the middle button.
• This mode allows you to move a single residue within a gap.
• None of the residues can be pushed beyond the ruler’s extreme positions.

16. Biopolymer Files
• Biopolymer Dictionaries on page 278
• Residue Files
• Dictionary Files
• User Creation of Dictionaries and Residues
• Databases of Ligands, Cofactors, and Chemical Groups
• Protein Homology Matrix Files on page 293
• Biopolymer Loop Files on page 294
• Keywords
• Sample Loop Search File
• Secondary Structure Prediction Files on page 298
• Example File
• Input and Output File Formats
• Common Biopolymer File Formats on page 301
• PIR File Format
• FASTA File Format

Biopolymer Dictionaries
16.1 Biopolymer Dictionaries

The dictionaries are a collection of text files used by SYBYL to handle
biopolymer molecules. They define the residue structures, conformational
definitions, connection rules, and other information used in constructing and
manipulating biopolymers. Several dictionaries are supplied with SYBYL/
Biopolymer. If you modify any of these standard dictionaries, you should keep a
copy of the original, for compatibility with the standard SYBYL release.
A Biopolymer dictionary consists of a single .dic file, which contains various

definitions and rules for the class of biopolymer built from that dictionary, and
one or more .res files, each of which describes a single residue. Biopolymer
dictionaries reside in the directory specified by Tailor variable BIOPOLYMER
DIRECTORY. The default directory for biopolymers is defined by the
environment variable TA_DICT, which is itself defined by Tailor variable
BIOPOLYMER DEFAULT_DICT. You may at any time open any dictionary by
name. You can create new residues and blocking groups for biopolymer dictio-
naries. Both of these operations will create a new .res file containing the new
residue.
16.1.1 Residue Files

The .res files contain all the information needed to construct a particular
residue. Each file contains the full name of a residue, a three letter code for that
residue (which is displayed for selection in menu mode) and a one letter code.
Either the one letter or the three letter code can be used to specify the residue in
command strings. The residue file also contains information about the general
class of the residue to identify it as an amino acid, nucleotide, carbohydrate,
blocking group, etc.
Each atom in a residue is specified by a line which contains the name of the
atom, the atom ID number, the SYBYL atom type, and the XYZ coordinates of
that atom (in an arbitrary coordinate system). Each atom entry also contains a
coded status indicator and may have a nickname for use in defining conforma-
tional angles (see Dictionary Files on page 286). Atom names must be uniquely
specified for each atom in a residue and for consistency should be similar to the
atom names in the other residue files pertaining to the same dictionary. SYBYL/
Biopolymer residues have atom names which follow as closely as possible the
IUPAC-IUB nomenclature conventions [Ref. 11].
The status indicator is a hexadecimal number which defines certain properties

of the atom. The left most hexadecimal digit is determined by starting at 0, and
adding the following numbers if the atom has the corresponding property:
• 8 - cap atom
• 4 - backbone atom

• 1 - essential hydrogen or lone pair
The second digit is 4 if the atom is on a direct backbone path from one end of
the residue to the other, and 0 otherwise. Cap atoms are atoms deleted when
connecting this residue to a subsequent or previous residue in a chain. Backbone
atoms are those atoms which would be part of a continuous chain from one end
of the biopolymer to the other (backbone atoms are automatically added to the
built-in set {BACKBONE}). Essential hydrogens or lone pairs are those connected
to polar atoms and are likely to be involved in hydrogen bonds. When building
biopolymers you have three choices based on Tailor variable BIOPOLYMER
BUILD_HYDROGENS. Biopolymers can be built with no hydrogens, with only the
essential hydrogens, or with all possible hydrogens included. This is to allow
easy construction of biopolymer structures appropriate for a particular modeling
process.
Residue files contain explicit information on the connectivity within the residue.
Each bond in the molecule is described by a line specifying the two atom IDs
comprising the bond, and the bond type.
Various property values and sets are included in residue files. The two typical
properties included by Tripos are the molecular weight of the residue and its ∆G
or free energy of formation. Sets defined for residues include charge sets and
alternate atom types to be used with force fields other than the Tripos force
field.
File Format:
This section describes the format of the .res biopolymer dictionary files. Inden-
tation in the format description indicates that the indented section is repeated
the number of times specified on the preceding line. The only restriction on the
format of the files is that items must be separated by white space (one or more
spaces, tabs, or new lines). Refer to the Tripos-supplied dictionary files for
guidance in understanding the formats.
MONOMER_NAME FULL_NAME MONOMER_CODE MONOMER_CLASS

NATOMS
ATOM_SERNO ATOM_TYPE X Y Z ATOM_NAME ATOM_STATUS
NICKNAME
NBONDS
ORIGIN_SERNO TARGET_SERNO BOND_TYPE BOND_STATUS
NSITES
SITE_ATOM SITE_CODE
NMONOMER_PROPS
PROP_NAME PROP_VALUE
NCHARGESETS
CHARGESET_NAME NATOMS

ATOM_NAME CHARGE
NALT_TYPE_SETS
TYPE_SET_NAME NATOMS
ATOM_NAME ATOM_TYPE
NCONN_BONDS
CONN_ATOM CAP_ATOM
NCONN_GROUPS
NGROUP_BONDS
CONN_ATOM CAP_ATOM
Legend:
MONOMER_NAME Common name of this residue. Must be 1-3 characters

long, begin with a letter, and contain no digits. This
name will be used in creating substructure names in
biopolymers.
FULL_NAME Full name of this residue.
MONOMER_CODE An optional one letter abbreviation for this residue. The
character must be alphabetic, or a period (.) to indicate
that no one-letter code will be used for this residue.

MONOMER_CLASS A comma-separated list of keywords defining the

monomer class (see examples below):
• amino_acid—Protein residue
• dna—DNA residue
• rna—RNA residue
• carbohydrate—carbohydrate residue
• water—Water residue
• other—Old keyword for water
• solvent—solvent residue
• cofactor—cofactor residue
• block—Blocking group
• head— Blocking group for chain head
• tail—Blocking group for chain tail
• special_blk—Blocking group that requires
special handling of atoms to be kept or discarded.
Currently: CXC, CXL, AMI, AMN, AMD, NMT.
• neutral_blk—Blocking group used for neutral
termini treatment (AMI, CXC)
• charged_blk—Blocking group used for charged
termini treatment (AMN, CXL)
• standard—Residue is one of the 20 standard
amino acids
• default (or modified)—Residue is a non-
standard amino acid. This information is used to
separate the standard amino acids from others in the
Build dialog.
• backbone_ring—Residue has a ring in the
backbone and needs a special algorithm to be
blocked and unblocked. Currently: PRO, HPR, HYP
NATOMS Number of atoms in this residue (including any cap
fragments).
ATOM_SERNO Sequential number for this atom. Serial numbers start at
0.
ATOM_TYPE Mnemonic atom type.
X, Y, Z X, Y and Z coordinates.
ATOM_NAME Name of atom.

ATOM_STATUS 8 digit hexadecimal code of the form XX000000. The 6

right most digits should always be 0. The left most
hexadecimal digit is determined by starting at 0, and
then adding the following numbers if the atom has the
corresponding property:
• 8 - cap atom
• 4 - backbone atom
• 1 - “essential” hydrogen or lone pair
The second digit should be 4 if the atom is on a direct
backbone path from one end of the residue to the other,
and 0 otherwise.
NICKNAME Nickname for this atom, which can be used in confor-
mational definitions. Nicknames must begin with the
underscore character (_). Nicknames are optional, and
need not be given for each atom; if given, the nickname
must appear on the same line as the ATOM_STATUS.
NBONDS Number of bonds in this residue.
ORIGIN_SERNO Serial number of origin atom of bond.
TARGET_SERNO Serial number of target atom of bond.
BOND_TYPE Mnemonic bond type.
BOND_STATUS Should always be 0; this value is ignored.
NSITES Number of interaction sites. Currently this number
should always be 4 for blocking groups and 0 for resi-
dues.
SITE_ATOM Name of atom for this SITE record.
SITE_CODE Indicate the nature of this site record.
• 1—Atom in a residue for this block to connect to
• 2—Atom in a residue to be discarded when adding
this block
• 3—Atom in this block to attach to a residue
• 4—Atom in this block to be discarded when
attaching to a residue
NMONOMER_PROPS Number of residue properties.
PROP_NAME Name of residue property. This name must be listed as
one of the residue properties in the dictionary file.
PROP_VALUE Value for this property. The type of this value must be
compatible with the PROP_TYPE given in the dictionary
for this property.
NCHARGESETS Number of charge sets given for this residue.
CHARGESET_NAME Name of this charge set.

NATOMS Number of atoms given in this charge set. (This may be

fewer than the number of atoms in the residue.)
CHARGE Atomic charge.
NALT_TYPE_SETS Number of sets of alternate atom types given for this
residue.
TYPE_SET_NAME Name of this set of alternate atom types.
NATOMS Number of atoms given in this alternate-type set (This
may be fewer than the number of atoms in the residue.)
ATOM_TYPE Mnemonic atom type.
NCONN_BONDS Number of bonds that can be replaced by inter residue
bonds. Should always be 0 for biopolymer residues.
CONN_ATOM Atom in this residue to be connected to the next resi-
due.
CAP_ATOM Atom in this residue to be discarded upon connection
with the next residue.
NCONN_GROUPS Number of groups defining inter residue connections
for this residue. These groups will be comprised of con-
nection bonds specified above. This number should be
0 for biopolymer residues.
NGROUP_BONDS Number of bonds in this bonding group. Should be 1.
Examples:
Examples of the first record line in the residue files for the standard amino acid
ALA and the blocking group BOC. Note the comma-separated lists for the
monomer class.
ALA alanine A amino_acid,standard,default
BOC N_Terminal_t-Butyloxycarbonyl . block,head,amino_acid
Associated expression generator: %dict_info() with option CLASSES or

MONCLASS (described in the SPL Manual).
How to Update your Custom Dictionary for Compatibility:

1. If your dictionary was patterned after the macromol dictionary (and created
in SYBYL 6.9 or older) you must modify the first line in the .dic file and
change the integer from 8 to 4.
Example: macromol 4

2. Starting with SYBYL 6.9.1, the specialized AMBER 95 dictionaries were

merged into the remaining dictionaries. We recommend that you rebuild
your customized dictionaries based on the current SYBYL dictionaries.
3. Modify your dictionary’s .res files to include the appropriate keywords on
the first line. The list below is from the macromol dictionary.
ALA amino_acid,standard,default
ARG amino_acid,standard,default
ASZ amino_acid,default
ASN amino_acid,standard,default
ASP amino_acid,standard,default
CYS amino_acid,standard,default
GLN amino_acid,standard,default
GLU amino_acid,standard,default
GLY amino_acid,standard,default
GLZ amino_acid,default
HID amino_acid,default
HIE amino_acid,default
HIP amino_acid,default
HIS amino_acid,standard,default
ILE amino_acid,standard,default
LEU amino_acid,standard,default
LYS amino_acid,standard,default
LYZ amino_acid,default
MET amino_acid,standard,default
PHE amino_acid,standard,default
PRO amino_acid,standard,default,backbone_ring
SER amino_acid,standard,default
THR amino_acid,standard,default
TRP amino_acid,standard,default
TYR amino_acid,standard,default
VAL amino_acid,standard,default
ABU amino_acid,modified
AIB amino_acid,modified
ANY amino_acid,modified
ARZ amino_acid,modified
BAL amino_acid,modified
CYM amino_acid,modified
CYX amino_acid,modified
DA amino_acid,modified
HCX amino_acid,modified
HCY amino_acid,modified
HPR amino_acid,modified,backbone_ring
HSE amino_acid,modified
HYP amino_acid,modified,backbone_ring
MAL amino_acid,modified
MBT amino_acid,modified
NLE amino_acid,modified
NMA amino_acid,modified
NML amino_acid,modified
NMS amino_acid,modified
NMV amino_acid,modified
NVA amino_acid,modified
ORN amino_acid,modified
ORZ amino_acid,modified
PHG amino_acid,modified
PSE amino_acid,modified

PSM amino_acid,modified
PSZ amino_acid,modified
PTM amino_acid,modified
PTY amino_acid,modified
PTZ amino_acid,modified
SAR amino_acid,modified
dA dna
dC dna
dG dna
dT dna
rA rna
rC rna
rG rna
rU rna
DRA carbohydrate
DRB carbohydrate
FRA carbohydrate
FRB carbohydrate
GAA carbohydrate
GAB carbohydrate
GLA carbohydrate
GLB carbohydrate
MAA carbohydrate
MAB carbohydrate
RBA carbohydrate
RBB carbohydrate
HOH other,water
SPC other,water
TIP other,water
WAT other,water
WTR other,water
AMN block,head,amino_acid,special_blk,charged_blk
AMI block,head,amino_acid,special_blk,neutral_blk
NMT block,head,amino_acid,special_blk
ACE block,head,amino_acid
BOC block,head,amino_acid
FOR block,head,amino_acid
PYR block,head,amino_acid
CXL block,tail,amino_acid,special_blk,charged_blk
CXC block,tail,amino_acid,special_blk,neutral_blk
AMD block,tail,amino_acid,special_blk
CME block,tail,amino_acid
EES block,tail,amino_acid
MES block,tail,amino_acid
NME block,tail,amino_acid
NMM block,tail,amino_acid
HB block,head,dna,rna
HE block,tail,dna,rna
OME block,carbohydrate

16.1.2 Dictionary Files

Dictionary files contain information on how to connect residues to form the
various classes of biopolymers. Each dictionary file contains a name (which is
displayed in menu mode) and a code value which specifies the type or class of
biopolymer defined by that particular dictionary. The code is necessary because
some of the Biopolymer commands are specific to a particular biopolymer class.
To obtain a listing of all the information defined within a particular dictionary
use the List Dictionary functionality.
Conformational (torsion) angles and states of biopolymers are defined by the

dictionary files. The angles are described by a name (e.g. phi, psi, alpha)
followed by the name or nickname of each atom comprising the torsion angle
and the residue in which each atom is found (the present residue, the previous
one, or the subsequent one). Conformational states define the values of some set
of conformational angles within a biopolymer class. These states are described
by a name (e.g. beta_sheet, alpha_helix, b_like), the conformational angles for
which values are assigned, and the actual values or ranges of those angles.
The next type of information stored in biopolymer dictionary files is the number
and names of atoms used to create inter residue bonds. Entries also define the
number of possible connections for each residue and the bond types of the inter
residue bonds. Finally, particular conformational states can be chosen as the
default settings for building inter residue bonds.
The properties ascribed to residues must be defined within the dictionary file.
Entries describe the names and number of property sets, the names and number
of charge sets, and the names and number of alternate atom type sets given to
each residue. The most important property defined by the dictionary is a list of
all residue files available for modeling operations with this class of biopolymer.
Some biopolymer classes define torsion angles within rings. In order to allow
these angles to be set, ring closure bonds must be defined and a set of possible
values for all the conformational angles in the ring must be given. This is
because closure of a ring forces all the torsion angles in the ring to be interde-
pendent. SYBYL/Biopolymer handles this difficulty by defining one of the
torsion angles in the incipient ring as a master angle and the rest as dependent
on this master angle. This information is contained within the dictionary file.
The final entries in the dictionary files are used to create the built-in sets which
relate to biopolymers and are listed in the Expression dialogs (see the SYBYL
Basics Manual).
File Format:
This section describes the format of the .dic biopolymer dictionary files. Inden-
tation in the format description indicates that the indented section is repeated
the number of times specified on the preceding line. The only restriction on the

format of the files is that items must be separated by white space (one or more
spaces, tabs, or new lines). Refer to the Tripos-supplied dictionary files for
guidance in understanding the formats.
DICTTYPE MOLECULE_TYPE_CODE
NCONF_ANGLES
ANGLE_NAME ANGLE_TYPE_CODE NANGLE_ATOMS
ATOM_NAME MONOMER_OFFSET
NCONF_STATES
STATE_NAME NCONTIG NANGLES
ANGLE_NAME ANGLE_INTERP ANG_MONOMER_OFFSET
ANGLE_VALUE
DISCREPANCY
NCONNECT_ATOMS
ATOM_NAME MAX_BRANCHES NCAP_ATOMS NCAP_BONDS
NCONNECTIONS
ORIGIN_ATOM_NAME TARGET_ATOM_NAME BOND_TYPE
NENFORCE_STATES
STATE_NAME
NMONOMER_PROPS
PROP_NAME PROP_TYPE
NCHARGESETS
CHARGESET_NAME
NALT_TYPE_SETS
TYPE_SET_NAME ASSOC_CHARGESET
NMONOMERS
MON_FILE_NAME
NDEPENDENCE_STRUCTURES
MASTER_ANGLE_NAME
BREAK_ORIGIN_NAME MONOMER_OFFSET
BREAK_TARGET_NAME MONOMER_OFFSET
NDEPENDENT_ANGLES
ANGLE_NAME
NINTERP_VALUES
MASTER_ANGLE_VALUE DEP_ANGLE1_VALUE DEP_ANGLE2_VALUE
…
NGLOBAL_SETS
SET_NAME OBJECT_CLASS SET_DEFINITION
Legend:
DICTTYPE Type of dictionary: macromol, protein, RNA, DNA,

sugar. This information is case-sensitive.

MOLECULE_TYPE_ Integer code for the type of biopolymer. SYBYL uses

CODE this code for functions that are particular to a specific
kind of biopolymer.
• 1 for proteins,
• 2 for nucleic acids,
• 4 for a general biopolymer of any type (e.g. one
using the macromol dictionary)
• 8 for sugars
• 128 for anything else.
NCONF_ANGLES Number of conformational angle definitions. These def-
initions can actually refer to bond lengths, valence or
torsion angles.
ANGLE_NAME Name of conformational angle.
ANGLE_TYPE_COD 1 = Torsion angle, 2 = valence angle, 3 = bond length.
E
NANGLE_ATOMS Number of atoms defining the angle: 4 for torsion
angles, 3 for valence angle, 2 for bond length.
ATOM_NAME Name of atom. This can be either a exact atom name
(e.g. CA), an atom nickname beginning with an under-
score (_) (e.g. _beta), or a positional identifier begin-
ning with a plus (+) or a minus (-) (e.g. +1).
MONOMER_OFFSET Residue offset of this atom in the angle definition.0 =
the previous residue, 1 = this residue, 2 = next residue
NCONF_STATES Number of conformational state definitions.
STATE_NAME Name of conformational state.
NCONTIG Number of contiguous residues that must be found with
these angle values in order to recognize a sequence as
having this conformational state. This value is used by
Biopolymer > Conformation > Find Secondary
Structure and by the {FINDCONF} built-in set. A spe-
cial value of 0 indicates not to look for this state.
NANGLES Number of conformational angles defining this state.
ANGLE_INTERP Letter indicating interpretation of this angle. ‘N’ means
a normal angle, and ANGLE_VALUE and DISCREPANCY
take on their normal meanings. “R” means a random
angle, and ANGLE_VALUE and DISCREPANCY indicate
the minimum and maximum allowable values; when
this conformational state is invoked, a random angle
value between the minimum and maximum values is
generated. For random angles, NCONTIG should be 0.

ANG_MONOMER_OF Which residue this angle applies to within the residues

FSET comprising this conformational state. This number will
normally be 0, but may be a positive integer if the state
spans more than one residue. For example, a value of 1
would refer to the angle in the second residue in a
sequence.
ANGLE_VALUE For normal angles, the angle value. For random angles,
the minimum angle value.
DISCREPANCY For normal angles, the discrepancy to be used in
searching for residues in this conformational state; a
residue’s angle value may not vary by more than this
number from ANGLE_VALUE in order to be considered
part of this conformational state. For random angles,
the maximum angle value.
NCONNECT_ATOMS Number of atom names that can connect to adjacent
residues to form normal inter residue backbone bonds.
ATOM_NAME Name of atom.
MAX_BRANCHES Maximum number of inter residue connections that can
be made from this atom.
NCAP_ATOMS Should always be 0.
NCAP_BONDS Should always be 0.
NCONNECTIONS Number of allowed inter residue connections.
ORIGIN_ATOM_NA Name of origin atom for inter residue connection.
ME
TARGET_ATOM_NA Name of target atom for inter residue connection.
ME
BOND_TYPE Mnemonic bond type for inter residue connection.
NENFORCE_STATE Number of conformational states to enforce upon mak-
S ing this connection.
STATE_NAME Name of conformational state.
NMONOMER_PROPS Number of residue properties.
PROP_NAME Name of residue property.
PROP_TYPE Type of data for this residue property.
• I = Integer
• F = Floating point
• S = String
NCHARGESETS Number of charge sets given for this dictionary.
CHARGESET_NAME Name of charge set.
NALT_TYPE_SETS Number of alternate sets of atom types.
TYPE_SET_NAME Name of alternate set of atom types.

ASSOC_CHARGESE Name of charge set to be used with this set of atom

T types.
NMONOMERS Number of residues and blocking groups in this dictio-
nary.
MON_FILE_NAME Name (without file extension) of the file defining this
residue or blocking group. The full file name should be
mon_file_name.res.
NDEPENDENCE_ST Number of angle dependency structures. Angle depen-
RUCTURES dencies are used when defining conformational angles
within a ring (e.g. the angle DELTA in nucleic acids).
The dependency entry allows for one of the bonds in
the ring to be temporarily broken. Then the given mas-
ter angle is set to the specified value, and any defined
“dependent” angles are set to values computed by inter-
polation from the pre-defined values given in the
dependency table.
MASTER_ANGLE_N Name of conformational angle that triggers dependen-
AME cies.
BREAK_ORIGIN_N Name of atom at one end of bond to break.
AME
MONOMER_OFFSET Residue offset of this atom.
BREAK_TARGET_N Name of atom at other end of bond to break.
AME
MONOMER_OFFSET Residue offset of this atom.
NDEPENDENT_ANG Number of conformational angles dependent on the
LES master angle.
ANGLE_NAME Name of conformational angle.
NINTERP_VALUES Number of pre-known values to be used for interpola-
tion.
MASTER_ANGLE_V Pre-known value of master angle.
ALUE
DEP_ANGLEn_VAL Pre-known value for each dependent angle.
UE
NGLOBAL_SETS Number of global sets defined by this dictionary.
SET_NAME Name of global set.
OBJECT_CLASS A for atom, B for bond, or S for substructure.
SET_DEFINITION Expression defining the global set.
How to Update your Custom Dictionary for Compatibility:

See the note in the Residue section above.

• Residue Files
16.1.3 User Creation of Dictionaries and Residues

You can create your own dictionary and residue files for use by SYBYL. This
can be done in one of two ways: by using the utilities within SYBYL or by
copying and editing the appropriate related files. Dictionaries reside in the
directory specified by Tailor variable BIOPOLYMER DIRECTORY. The default
directory for biopolymers is the environment variable TA_DICT. If you modify
any of these standard dictionaries, you should create a copy of the originals as
well as $TA_DICT/AMB_PARMS/ in one of your directories and use Tailor
variable BIOPOLYMER DIRECTORY to specify that directory. You will then
avoid incompatibility with future standard SYBYL releases and you can make
modifications which will not affect other users of the program. You can create
new residues and blocking groups for biopolymer dictionaries. This command
creates a new .res file containing the new residue in the directory specified by
Tailor variable BIOPOLYMER DIRECTORY. Tripos will be glad to consider any
new residue or dictionary files you create for possible inclusion in future
SYBYL releases.
If you receive error messages when opening your own dictionary or residue file,
you will usually be given sufficient information to find and correct offending
lines. Residues are read in the order in which they are listed in dictionary files.
This means that the first occurrence of a particular three or one letter code read
by the program will take precedence over subsequent entries with duplicate
codes. This is significant when defining multi-class dictionaries which
recognize more than one type of biopolymer. For example, if you want to create
a combined DNA-protein dictionary, the entries for the DNA residues should
occur first in the list of residues. In this way, the sequence expression
A=C=G=T to be interpreted as a DNA sequence rather than ala=cys=gly=thr
which is what should be used if a protein sequence was intended. Since there
are only 26 possible one-letter codes, you are restricted to three letter names
when creating large dictionaries of residues.
• Residue Files

16.1.4 Databases of Ligands, Cofactors, and Chemical Groups

Cofactors
The macromol dictionary includes residue files for many common cofactors.
These are used by SYBYL’s PDB reader to interpret HETATM records.
SLN definitions for the cofactor templates are in $TA_DICT/cofactors.def.

This file is used by Biopolymer > Prepare Structure > Fix SYBYL Atom
Types in Cofactor.
Ligand Database
The macromol dictionary includes a ligand database that is based on infor-
mation retrieved from the Ligand Depot site, a service associated with RCSB.
This database helps the SYBYL PDB reader assign correct atom and bond types
to most ligands. The atom set {LIGDB} contains all atoms that are typed using
the ligand database.
SLN definitions for the ligand templates are in $TA_DICT/ligand_db.def.
Database of Chemical Groups

A database of chemical groups increases the PDB reader’s ability to assign
SYBYL atom and bond types to HETATM records that cannot otherwise be
interpreted.
SLN definitions for common chemical groups are in $TA_DICT/

group_db.def.
• Residue Files

Protein Homology Matrix Files
16.2 Protein Homology Matrix Files

This document describes the format of the homology matrix files used in
Biopolymer > Protein Loops > Analyze Search Results (BIOPOLYMER
LOOP ANALYZE) to compute sequence homologies. The files contain a simple
similarity matrix that specifies the score for each amino acid changing to each
other amino acid.
First line:
Number of amino acids (n)
Second line:
1-letter amino acid codes, with no spaces. Use the letter X for an amino acid
whose type is unknown.
Subsequent lines:
n x n matrix of floating point or integer scores. The entry in row i, column j
th
gives the score of changing from the ith to the j amino acid, as listed in the
second line of the file.
Sample file:
This is the file used for the ALA_PRO_GLY homology method, in which
changes to or from a PRO or GLY have a score of 0, other changes, 1.
21
ACDEFHIKLMNQRSTVWYPGX
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Biopolymer Loop Files
16.3 Biopolymer Loop Files

This section describes the format of the files recognized by Biopolymer >
Protein Loops > Analyze Search Results (BIOPOLYMER LOOP ANALYZE).
In principle, the Analyze Search Results tools can be used to analyze protein
conformations generated by any technique, as long as they conform to the
following guidelines:
• Each conformation has the same molecular composition (i.e. the only
thing that changes is atomic coordinates
• The portion of the molecule whose coordinates change must consist only
of residues existing in a SYBYL dictionary
• The portion of the protein being modeled must be one contiguous
sequence
• The first and last residue in the sequence being modeled must already
exist in the target SYBYL molecule
The file format has the flavor of Tripos mol2 files, with relevant information
introduced by keywords. White space in the file is generally treated all the
same, with the following restrictions: keywords must be the first non-blank
word on a line, and there must be a single value following the keyword on the
same line.
The loop search results are introduced by a line whose contents is:
@<TRIPOS>LOOPSEARCH
16.3.1 Keywords
Anything in the file prior to this line will be ignored. Following this line are
various pieces of data introduced by keywords. Here is a complete list of the
possible keywords:
Keyword Description
#total_len# Number of residues in each fragment
#na1# Number of residues in the N anchor region (*)
#na2# Number of residues in the C anchor region (*)
#end_thresh# End-to-end distance threshold used in the search
(*)
#dist_thresh# Inter-CA distance threshold used in the search
(*)
#fit_thresh# RMS deviation threshold used in the search (*)
#min_nloops# Minimum # of loops specified to save from
search (*)

#max_nloops# Maximum # of loops specified to save from

search (*)
#low_loop_wt# Minimum weighting of database coordinates
used (*)
#high_loop_wt# Maximum weighting of database coordinates
used (*)
#nfit_anames# Number of atoms in each residue used in fitting
to the anchor region (*)
#fit_anames# Names of atoms used in fitting anchor regions
(*)
#first_subid# substructure ID of the first residue in the data
set (*)
#first_subname# Name of the first residue in the data set
#last_subid# substructure ID of the last residue in the data set
(*)
#last_subname# Name of the last residue in the data set
#nloop_atoms# Total number of atoms in each fragment
#nloops# Number of fragments in this file
#restypes# Residue sequence in target protein
#natoms_per_res# Number of atoms provided for each residue in
the fragment
#loop_anames# Names of atoms provided in each residue
#subnames# Names to give the substructures created in target
protein
#newloop# Introduces data on a single fragment
#loop.loopnum# Serial number of this fragment a
#loop.source# Text indicating where this fragment came from
#loop.rms# RMS deviation of fit to anchor regions
#loop.homol_source# Homology score of this fragment
#loop.restypes# Residue sequence of this fragment
#loop.coords# Actual x,y,z coordinates of each atom in this
fragment
#endloopsearch# End of data
a. Indicates optional data, which may be omitted from the file
Note: Several of the keywords require more than one piece of data. In these
cases, the keyword is followed by a count of how many data items follow, and
then the individual data items, separated by white space. See the sample file
below for an illustration. The keywords that require more than one piece of data
are:

#restypes#, #natoms_per_res#, #subnames#, #fit_anames#, #loop_anames#,

#loop.restypes#, #loop.coords#
16.3.2 Sample Loop Search File

The following is an example of complete loop search file:
@<TRIPOS>LOOPSEARCH
#total_len# 7
#na1# 2
#na2# 1
#nfit_anames# 3
#end_thresh# 1.500000
#dist_thresh# 1.000000
#fit_thresh# 2.000000
#min_nloops# 1
#max_nloops# 2
#low_loop_wt# 0.000000
#high_loop_wt# 1.000000
#first_subid# 2
#first_subname# ALA2
#last_subid# 8
#last_subname# ALA8
#nloop_atoms# 28
#nloops# 2
#restypes# 7
ALA GLY ARG ALA ALA SER VAL
#natoms_per_res# 7
4 4 4 4 4 4 4
#subnames# 7
ALA2 GLY3 ARG4 ALA5 ALA6 SER7 VAL8
#fit_anames# 3
N CA C
#loop_anames# 28
N CA C O
N CA C O
N CA C O
N CA C O
N CA C O
N CA C O
N CA C O
#newloop#
#loop.loopnum# 1
#loop.source# 1ECD:ARG101
#loop.rms# 0.055766
#loop.restypes# 7
ARG ALA GLY PHE VAL SER TYR
#loop.coords# 28
-0.594445 0.227032 2.406230
-1.102960 -0.355941 3.641205
-2.588687 -0.680418 3.499087
-3.027344 -1.777239 3.855146
-3.349523 0.284838 2.980202
-4.768897 0.097947 2.785235
-5.049965 -1.090481 1.864952
-5.943996 -1.898743 2.135735
-4.273731 -1.178697 0.794079

-4.446576 -2.276323 -0.175697

-4.116119 -3.637508 0.432503
-4.833518 -4.623113 0.212178
-3.036101 -3.672635 1.184831
-2.638254 -4.908762 1.873256
-3.729743 -5.390729 2.825635
-4.072447 -6.581861 2.851103
-4.259396 -4.459794 3.596405
-5.227653 -4.809894 4.642437
-6.539927 -5.240013 3.986479
-7.215721 -6.162398 4.464305
-6.880241 -4.565977 2.898039
-8.017278 -4.987381 2.074087
-7.861074 -6.418379 1.561628
-8.810762 -7.215007 1.599689
-6.663944 -6.717199 1.077911
-6.325955 -8.076962 0.665923
-6.529846 -9.057429 1.821012
-7.136806 -10.119126 4.646964
#newloop#
#loop.loopnum# 2
#loop.source# 1CTS:HIS211
#loop.rms# 0.060439
#loop.restypes# 7
HIS ASN PHE THR ASN MET LEU
#loop.coords# 28
-0.603728 0.207158 2.409742
-1.109346 -0.372968 3.636718
-2.586426 -0.709240 3.472759
-2.998666 -1.838515 3.748773
-3.357229 0.248591 3.021435
-4.786656 0.103779 2.762928
-5.090608 -1.088705 1.861036
-6.013541 -1.883909 2.125013
-4.218325 -1.269534 0.920330
-4.293247 -2.336245 -0.046748
-4.140272 -3.690443 0.640868
-4.985909 -4.612694 0.477493
-3.113847 -3.788085 1.438660
-2.738399 -5.033155 2.135940
-3.771883 -5.395658 3.190427
-4.306411 -6.527510 3.242102
-4.332093 -4.364601 3.703964
-5.446328 -4.387273 4.640953
-6.788879 -4.802040 4.023455
-7.675807 -5.393176 4.701812
-6.919182 -4.544548 2.722026
-8.123690 -4.955043 1.950414
-7.938790 -6.349020 1.360276
-8.937385 -7.097849 1.177479
-6.665569 -6.689100 1.077456
-6.278524 -8.044886 0.673736
-6.505959 -8.997541 1.843350
-7.074253 -10.085127 1.655012
#endloopsearch#

Secondary Structure Prediction Files
16.4 Secondary Structure Prediction Files

There are many secondary structure prediction methods. You may also use your
own method from SYBYL by specifying a shell file to execute your program.
See Also:
• Secondary Structure Prediction on page 319 for a list of methods
provided in SYBYL
• Predict Secondary Structure on page 183 for a description of the
command syntax
16.4.1 Example File

The following is an example of a shell script myprog.sh, which would start the
program myprog.exe.
#!/bin/sh
# myprog.sh
#
# This shell will run the program “myprog.exe”.
# In this example, myprog.exe will read a table and then must
# be able to read the standard primary sequence file and write
# the two output files. PREDICT_SECONDARY will pass this shell
# the following 3 arguments:
# $1: infile (input amino acid sequence file)
# $2: file.pred (output prediction file)
# $3: file.prob (output probability file)
#
if [ $# -ne 3 ]; then
echo “usage: $0 requires 3 arguments.”
exit 1
fi;
$HOME/myprog.exe <<!
$HOME/myprog.table
$1
$2
$3
!
exit 0
Here is an example of part of a Fortran program that could be executed
by the above shell:
c*****************************************************************
c myprog.for
c This program opens the 4 files passed to it by the shell
c “myprog.sh”.
c*****************************************************************
program myprog
character*1 iseq(maxres)
character*132 file1, file2, file3, file4
c Myprog.table (input data table)
read (5, '(a132)') file1
c Input sequence file
read (5, '(a132)') file2
c File.pred (output prediction file)

read (5, '(a132)') file3

c File.prob (output probability file)
read (5, '(a132)') file4
open (7, file = file1, status = 'OLD')
open (8, file = file2, status = 'OLD')
open (9, file = file3, status = 'UNKNOWN')
open (10, file = file4, status = 'UNKNOWN')
c (Read data table)
c Read the sequence file
read (j1,*) numres
read (j1,'(80a1)') (iseq(j),j=1,numres)
c (secondary prediction algorithm goes here)
c (write the two output files here)
stop
end
Two files are written to your directory after each prediction. A prediction file
contains the conformation assignments and will be listed in SYBYL. A proba-
bility file contains the conformational probabilities for each residue.
16.4.2 Input and Output File Formats

The formats for the files used to predict secondary structure are as follows:
Input Amino Acid Sequence File

Line 1: number of residues
Other lines: 80 chars/line, each character represents an amino acid.
A = ala G = gly M = met

S = ser C = cys H = his
N = asn T = thr D = asp
I = ile P = pro V = val
E = glu K = lys Q = gln
W = trp F = phe L = leu
R = arg Y = tyr
Prediction File (name.pred)
Fields must be separated by blanks.

Line 1: <number of residues> <prediction method>
Line 2: <first 80 residue codes>
Line 3: <conformation of each of the above residues>
Following lines: repeat lines 2 & 3.

Probability File (name.prob)
Fields must be separated by blanks.

Line 1: header
Other lines: residue_code residue_number %alpha %beta %coil confor-
mation

Common Biopolymer File Formats
16.5 Common Biopolymer File Formats

16.5.1 PIR File Format
For a description of the file format, see:
http://www.bioinformatics.nl/tools/crab_pir.html
16.5.2 FASTA File Format

For a description of the FASTA format visit the Oxford University Bioinfor-
matics Centre’s web site:
http://www.compbio.ox.ac.uk/bioinformatics_faq/format_examples.shtml#fasta
Pearson, W. R. (1999) Flexible sequence similarity searching with the FASTA3

program package. Methods in Molecular Biology

17. Biopolymer Commands
ADDH Biopolymer > Prepare Structure > Add Hydrogens…

ADD_SIDECHAINS Biopolymer > Prepare Structure > Add Sidechains…
ALIGN_SEQUENCES Biopolymer > Compare Sequences > Align and Write
MSA…
ALIGN_STRUCTURES Biopolymer > Compare Structures > Align Structures By
Homology…
ASSIGN_SEC_STR Biopolymer > Conformation > Find Secondary Structure...
BFACTORS View > Color by Scheme > B-Factors
BLOCK Biopolymer > Prepare Structure > Edit Termini…
BREAK Biopolymer > Build > Break Chain…
BUILD Biopolymer > Build
CAP Biopolymer > Prepare Structure > Edit Termini…
CHANGE Biopolymer > Composition > Mutate Monomers…
CHECK_GEOMETRY MDE: Protein > Check Local Geometry…
COLOR View > Color by Scheme
COMMAND Pick an item in another menu.
CONNECT_CA
CONSTRUCT_BACKBONE Biopolymer > Build > C Alpha to Backbone...
CONVERT Biopolymer > Prepare Structure > Convert PDB Atom
Names…
COPY_CONFORMATION Biopolymer > Conformation > Copy…
CYCLE Biopolymer > Build > Create Cycle…
DICTIONARY Biopolymer > Dictionary & Database Admin
ADD MONOMER Biopolymer > Dictionary & Database Admin> Manage
Custom Dictionary…
CLOSE Biopolymer > Dictionary & Database Admin > Close…
CREATE BLOCK
CREATE DICTIONARY Biopolymer > Dictionary & Database Admin> Manage
CREATE MONOMER Biopolymer > Dictionary & Database Admin> Manage
LIST Biopolymer > Dictionary & Database Admin > List Dictio-
nary…
OPEN Biopolymer > Dictionary & Database Admin > Open Dic-
tionary…

DISPLAY
DISULFIDE Biopolymer > Build > Create Disulfide…
DNAHELIX Biopolymer > Build > DNA Double Helix...
ENDMODE Pick an item in another menu.
EXCISE Biopolymer > Composition > Excise Monomers…
FASTA File > Import File > Sequence
FIND
CONFORMATION
SEC_STR Biopolymer > Conformation > Find Secondary Structure...
FIT Biopolymer > Compare Structures > Fit Monomers…
FIX_ASN_GLN Biopolymer > Prepare Structure > Fix Sidechain Amides…
FIX_END_GROUPS Biopolymer > Prepare Structure > Fix End Groups…
FIX_MOLECULE
FIX_PROLINE Biopolymer > Prepare Structure > Fix Prolines…
FIX_SIDECHAINS Biopolymer > Conformation > Scan Sidechains Torsions…
FTP File > Retrieve PDB...
INSERT Biopolymer > Composition > Insert Monomers…
JOIN Biopolymer > Build > Join Chains…
LABEL_ATOMS
LOAD
CHARGES Biopolymer > Prepare Structure > Load Charges…
DEFINE_UNKSET
DEFINE_ZEROCHARGESET
DICT_CHARGES
DICT_TO_USER
DICT_TYPES
MINIMAL_USER_SET
OTHER_ATOM_TYPES Biopolymer > Prepare Structure > Assign AMBER Atom
Types
SLN_AUTO_CHARGES
SLN_AUTO_TYPES
LOOP Biopolymer > Protein Loops
ANALYZE Biopolymer > Protein Loops > Analyze Results…
SETUP Biopolymer > Protein Loops > Search PRODAT Data-
base…
MEASURE Biopolymer > Conformation > Measure…

MULT_ALIGN_SEQ Biopolymer > Compare Sequences > Align and Write

MSA…
OLD_RAMACHANDRAN Not accessible from the menubar.
PHOSPHORYLATE Biopolymer > Build > Phosphorylate…
PIR File > Import File > Sequence
POLY_BLOCK Biopolymer > Prepare Structure > Edit Termini.
PREDICT_SECONDARY Biopolymer > Conformation > Predict Secondary…
PROTONATE Biopolymer > Prepare Structure > Set Protonation Type
RAMACHANDRAN MDE: Protein > Ramachandran Plot
REMOVE Biopolymer > Composition > Delete Monomers…
RENUMBER Biopolymer > Prepare Structure > Renumber Sequence…
REPLACE Biopolymer > Composition > Replace Sequence…
RESIDUE_FIT Biopolymer > Compare Structures > Local RMS Fit of
Conformers…
Biopolymer > Compare Structures > Find and Fit Fixed
Regions…
RIBBON View > Surfaces and Ribbons > Quick Ribbons > Line
Ribbon (without a MOLCAD license)
RNAHELIX Biopolymer > Build > RNA Double Helix...
SEQUENCE Biopolymer > Compare Sequences > List Sequence…
SET CHAINNAME Biopolymer > Prepare Structure > Set Chain Names…
SET CONFORMATION Biopolymer > Conformation > Set Backbone Conformation
Biopolymer > Conformation > Set Sidechain Conformation
SET TERMINI Biopolymer > Prepare Structure > Chain Termini Sets...
SMALL_TO_POLY Not accessible from the menubar.
TRACE View > Protein View > C Alpha Trace
TWEAK Biopolymer > Protein Loops > Tweak Conformational
Search…
TYPE_COFACTOR Biopolymer > Prepare Structure > Fix SYBYL Atom Types
in Cofactor
When typing commands in the Command Console you can access the
biopolymer functions in either of two ways:
• Start each command with BIOPOLYMER
• Enter the BIOPOLYMER mode by issuing the command MODE
BIOPOLYMER. To exit this mode, type ENDMODE at the Biopolymer
prompt. While in BIOPOLYMER mode. You may execute a single top-
level SYBYL command by preceding it with the word COMMAND.

Associated Tailor Variables
Note: As with all SYBYL commands, the BIOPOLYMER command can be abbre-
viated to a unique initial string, such as bio. However, we strongly recommend
that you always spell out command names in SPL scripts.
17.1 Associated Tailor Variables

Tailor variables can be used to parameterize Biopolymer dialogs and
commands. The following subjects are documented in the Tailor Manual.
• BIOPOLYMER
• CONSTRUCT_BACKBONE
• FORCE_FIELD
• HBONDS
• PDB
• PROTEIN_LOOP
• PROTEIN_SEARCH
• RAMACHANDRAN
• RENDER
• RIBBON
• SCAN
• TWEAK

18. Biopolymer Theory
This chapter describes the theoretical background of some of the more scientifi-
cally complex commands within SYBYL. Discussions about the biopolymer
dictionaries, force field methods, and an overview of specific modeling applica-
tions are included.
• Introduction on page 308
• Protein Modeling on page 309
• Binary Protein Database
• Sequence Alignment
• Needleman-Wunsch
• Homology Matrices
• Gap Penalty
• Alignment Evaluation
• Protein Completion on page 316
• Backbone Construction
• Sidechain Addition
• Biopolymer End Group Modeling
• Secondary Structure Prediction
• Protein Loop Searching
• Protein Folding And Model Generation on page 319
• Secondary Structure Prediction
• Protein Loop Searching
• Random Tweak Loop Generation
• Protein Loop Analysis
• Small Peptide Methodology
• Nucleic Acid Modeling on page 328
• Single Strand Nucleic Acids
• Nucleic Acid Double Helices
• Polysaccharide Modeling on page 330

Introduction
18.1 Introduction
SYBYL provides a flexible environment for the display and manipulation of
large and small biomolecules. A ready-to-use capability for modeling of
polypeptides, polynucleotides (RNA and DNA) and polysaccharides is built-in.
SYBYL allows small molecule and biopolymer modeling. Thus, processes like
substrate or inhibitor binding to an enzyme or hormone-receptor interactions
can be studied.
In addition to treating biopolymer structures atom by atom, SYBYL builds and

modifies structures on a residue by residue basis. A dictionary contains the
definitions of available residues as well as the types of connections and confor-
mational states possible. The composition and characteristics of available
residues are maintained in residue files. You can create your own residues
within a biopolymer class or add additional classes of biopolymers.
Much of our knowledge of the structure of proteins and nucleic acids comes
from X-ray diffraction studies. The repository of this information has always
been the Protein Data Bank [Ref. 1]. SYBYL reads and writes files in standard
PDB format. Sequence data for proteins and polynucleotides are maintained by
a number of groups. SYBYL reads and writes the PIR format of the National
Biomedical Research Foundation [Ref. 2]. Examples of both these file formats
may be found in the TA_DEMO directory. Comprehensive reviews of
biopolymer structure can be found in the books by Schulz and Schirmer for
proteins [Ref. 3], Saenger for nucleic acids [Ref. 4], and Aspinall for polysac-
charides [Ref. 5].
Biomolecular systems tend to be complex and large in size. To study and under-
stand many aspects of these structures, it is helpful to have tools capable of
highlighting selected features of a biopolymer’s three-dimensional represen-
tation. The utility of color computer graphics in this regard has been widely
acknowledged [Ref. 6]. Visual enhancement by the use of ribbon displays and
by the capacity to formulate general and flexible coloring schemes contributes
significantly to the goal of understanding biopolymer structure.
For the modeler trying to probe structure-function relationships in biopolymers

it becomes imperative to calculate at least semi-quantitative energies for intra-
and intermolecular interactions. In these situations conformational energy and
molecular mechanics [Ref. 7-Ref. 10] calculations, through the use of geometry
optimization or molecular dynamics (MD) techniques, represent additional
modeling tools.

Protein Modeling
18.2 Protein Modeling

An important problem in this area is that of predicting the structure of a protein.
SYBYL provides a number of tools for simple or local modeling of protein
structure. FUGUE and ORCHESTRAR are advanced protein modeling tools for
more complex homology modeling. Protein modeling in general can be divided
into two functional approaches: energy-based or knowledge-based. In many
cases these approaches are complementary and are often used in combination.
Energy-based modeling involves the use of geometry optimization or molecular

dynamics techniques to achieve relaxation of strain in limited regions of protein
structures. Applications of this methodology include optimization of the active
site geometry of structures from the Protein Data Bank (which in general is not
strain-free), site-directed mutagenesis (where the strain introduced by sidechain
modification is relaxed using energy-based techniques like SCAN, MAXIMIN2,
ANNEAL, and/or DYNAMICS), optimization of short chain insertions or deletions
(which introduce large amounts of strain into existing structures), and
constrained minimization of structures to meet experimental criteria like
distances and dihedral angles from NMR. In a recent study [Ref. 16], where
single amino acid substitutions were considered, very limited perturbations of
the parent structure were expected; comparison to experimental data seemed to
confirm this expectation. An interesting alternative approach makes a less
extensive use of the known 3D structure and tries to build a model of the target
protein following a more global reconstruction strategy from carefully chosen
interatomic distances [Ref. 17]. Clearly, this latter method requires heavier use
of computer resources but it may be used, when possible, to test and perhaps
challenge the models produced by the minimum perturbation technique
described above.
Many of the methods mentioned above may fail when confronted with large
insertions or deletions; these often occur in loop regions of proteins. Energy-
based methods are usually capable of making a single educated guess of what a
loop conformation may be like. However, the available data is often consistent
with a variety of conformations for these loop regions. Thus, there arise the
issues of (1) sampling the conformational space available for the loops and (2)
choosing from the produced sample one (or a few) that may be considered best.
For those purposes a variety of CPU intensive techniques have been proposed
[Ref. 18-Ref. 20]. These techniques (especially those described in Ref. 19 and
Ref. 20) put a lot of trust in the potential energy function or force field used.
However, they usually provide a good idea of the extent of geometric variability
expected in loop conformations, and for small enough loops, may actually
produce a systematic and exhaustive enumeration of attainable backbone
conformations.

Protein Modeling
An alternative to these energy-based techniques can be found in a knowledge-

based approach. Knowledge-based modeling attempts to use what is known
about protein structures in general to predict or fit specific cases. This approach
includes (1) statistical techniques like sequence alignment and secondary
structure prediction, and (2) searching techniques which try to meet selected
criteria by retrieving protein fragments from a structural database. Loop Search
is an example of the latter in which proposed loop conformations for the protein
being modeled are obtained from loops in actual proteins (which, in general, are
not evolutionarily related to the target protein). Modeling of complex structures
like proteins is never an all-or-none process. For this reason, candidate struc-
tures retrieved using knowledge-based methods can be analyzed for suitability
using the Molecular Data Explorer.
18.2.1 Binary Protein Database

SYBYL includes a binary processed form of the Protein Data Bank for its
knowledge base. The structures included in the database were selected by Tom
Blundell’s group. High resolution structures representative of the various
protein classes were carefully chosen from the PDB database. You can rebuild
this database as additional structures become available or if your own local set
of structures is preferred to the SYBYL standard set. Read the instructions on
how to rebuild the database on page 258.
A suite of SPL expression generators (%PDBINFO(), %PDBFILTER(), and

%PDBRETRIEVE()) is available for retrieving information and structures from
the database, based on flexible sequence or structural criteria. This provides a
valuable tool for researchers interested in probing the PDB for trends in the
structural data. These expression generators are documented in the SPL Manual.

Protein Modeling
18.2.2 Sequence Alignment

The following SYBYL functionality perform sequence alignments:
• Biopolymer > Compare Sequences > Align and Write MSA
(BIOPOLYMER MULT_ALIGN_SEQ)
• BIOPOLYMER ALIGN_SEQUENCES
Each of these features attempts to determine an optimal alignment between two

or more sequences of amino acids, based on a particular gap penalty and
homology matrix.
The goal of sequence alignment is to determine whether two or more sequences

are related. The optimal method would automatically compare sequences and
determine whether they are related and if so, exactly how they are related struc-
turally and functionally. Current alignment techniques are a step in that
direction but it is important to remember that no matter how mathematically
rigorous, they are only capable of suggesting biological relationships and can
never be used as proof of biological similarity.
Sequence alignment is an active area of research with a large body of associated

literature. For excellent reviews of protein and nucleic acid sequence alignment
see [Ref. 24-Ref. 26]. Many algorithms for automatically aligning sequences
have been developed. Some are very fast but sacrifice sensitivity to achieve this
speed (such as the k-tuple search of FASTp [Ref. 27]). Others are slower, but
very sensitive, such as the Needleman-Wunsch method [Ref. 28]. SYBYL’s
alignment commands are based on this latter method.
Needleman-Wunsch
For sequences of length n, there are on the order of 22n possible alignments. To
search all these possibilities for the best alignment would be computationally
prohibitive. Needleman and Wunsch [Ref. 28] provided the classic solution to
this problem by developing a dynamic programming algorithm that aligns two
sequences of length n and m in order nm time. The original Needleman-Wunsch
method used a penalty for each gap. Our implementation of Needleman-Wunsch
is that of Fredman [Ref. 29] in which the gap penalty is independent of the size
of the gap. The beauty of the Needleman-Wunsch method is that it is guaranteed
to find an optimal alignment for the given homology matrix and gap penalty.
There may be several optimal alignments (alignments with the highest score). In
this case SYBYL will report only one of these.

Protein Modeling
Homology Matrices
The protein homology matrices use the following standard single letter amino
acid codes:
Table 1 Amino Acid Codes

A ala G gly N asn V val
B asx H his P pro W trp
C cys I ile Q gln X xxx
D asp K lys R arg Y tyr
E glu L leu S ser Z glx
F phe M met T thr
SYBYL includes several protein homology similarity matrices that are

selectable via Tailor variable BIOPOLYMER SIMILARITY_MATRIX. These
include:
• APG—This is the ala-pro-gly matrix in which a score of 0 is given when
changing from or to a PRO or GLY, and 1 is given for any other change.
• GREER—This matrix was constructed from the data in [Ref. 32] and is
based on the following table of equivalences:
Table 2 Greer Matrix
SET RESIDUES
1 DEKR
2 GAV
3 AVLI
4 VLIM
5 FYW
6 ST
7 QN
8 GP
The amino acids are divided into sets based on

properties such as size, charge, polarity and
hydrophobicity [Ref. 33]. Values in the table
were assigned 1.0 for identity, 0.5 based on sim-
ilarity from the above set and zero otherwise.
The values obtained from this table are quite
similar to those of the PHYSPROP table.

Protein Modeling
IDENTITY In this simple matrix, a score of 1 results from a

comparison of a residue with itself, otherwise 0
is assigned.
MUTATION This is the classic Dayhoff [Ref. 34] similarity
matrix of 250 PAMs (amino acid mutations per
100 residues). It is the best matrix for compar-
ing distantly related proteins.
PMUTATION This is the MUTATION data matrix with 8 added
to each value to insure all numbers in the matrix
are positive. This is the default matrix.
PHYSPROP This matrix has scores based on similarity of
physical properties of amino acids, such as
hydrophobicity, bulk, and a helical tendency.
SWISS This matrix, developed at the Swiss Federal
Institute of Technology, was derived from an
exhaustive alignment of the entire sequence
database using the Needleman-Wunsch align-
ment algorithm on sequences arranged in a
patricia tree. This matrix is therefore better than
the Dayhoff matrix, which was derived from
alignments of a small set of proteins that were
very similar to each other. Therefore, the SWISS
matrix is valid not only for sequences with a
high degree of similarity, but also for those that
are borderline.
Gaston H. Gonnet, Mark A. Cohen, Steven A.
Benner, “Exhaustive Matching of the Entire
Protein Sequence Database” Science 1992, 256,
1443-1445.
Min = -5.2, Max = 14.2, Ave = -0.4, Recom-
mended gap penalty = 4.
SWISS2 This is the SWISS matrix with 5.2 added to each
value to eliminate negative numbers.
Min = 0, Max = 19.4, Ave = 4.8, Recommended
gap penalty = 6.
You can create your own homology matrix file with the format specified on
page 293. SYBYL will search for this file first in your current directory and
then, if it is not there, in the directory specified by Tailor variable BIOPOLYMER
SIMILARITY_MATRIX.

Protein Modeling
Gap Penalty
The quality of the alignment is heavily dependent on the gap penalty. The
higher the gap penalty, the greater the resistance to insertion of new gaps into
the alignment. It is important to select a gap penalty appropriate for the
particular homology matrix in use. The best penalty is typically the average of
all the values in the current homology matrix and must be a positive integer.
Alignment Evaluation
Doolittle [Ref. 26] defines some rules of thumb to determine if two sequences
are similar enough to be considered related. If they are longer than 100 residues
in length and are greater than 25% identical (with appropriate gaps) then they
are very likely related. If they are 15 to 25% identical, then they may still be
related and jumbling (see below) should be performed to determine the statis-
tical significance of the alignment. If they are less than 15% identical, they are
probably not related.
For each alignment, SYBYL reports the following information:
Identity Score
The identity score (% identity) reported by BIOPOLYMER ALIGN_SEQUENCES
is the number of identical residues in the two sequences divided by the length of
the shortest sequence (without gaps).
For example:
ACD 2/2 (100% identity)

A-D
AC-EFGHI 2/3 (66.6% identity)
-CD-F---
Biopolymer > Compare Sequences > Align and Write MSA and
BIOPOLYMER MULT_ALIGN_SEQ compute the identity score by dividing the
number of identical residues by the length of the first sequence in the list.
Therefore, the first sequence always has an identity score of 100%.
Alignment Score
The alignment score is a measure of the similarity of the aligned sequences. The
higher the score per given homology matrix and gap penalty, the better the
alignment.

Protein Modeling
Normalized Alignment Score

This is the alignment score normalized for 100 residues.
score × 100.0 [EQ 1]

nas = --------------------------------
slen – 1
where
• nas is the normalized alignment score
• slen is the length of the smallest sequence in the alignment
• score is the global alignment score (see above)
Jumbling and Significance

If two sequences are between 15 and 25% identical, it is important to determine
whether the alignment is statistically meaningful. One way to do this is by
comparing the actual alignment score of the sequences with the average
alignment score of random sequences derived from the original two sequences
(jumbling).
The number of jumbles is set by Tailor variable BIOPOLYMER

NUMBER_JUMBLES. The default value of five is the minimum recommended
value. This will generate five randomizations of sequence 1 and five randomiza-
tions of sequence 2. Each derivative of sequence 1 is then aligned against each
derivative of sequence 2, resulting in a total of 25 alignments. Both the mean
and the standard deviation of the jumbled alignments are reported.
The sequences are aligned using the method of Needleman & Wunsch [Ref. 28]
as implemented by Fredman [Ref. 29]. Gaps may be inserted into either
sequence to find an optimal alignment, based on the current length-independent
gap penalty1 and the homology matrix2. For each aligned pair, the percentage of
residue positions having the same amino acid in both sequences is calculated.
Some proteins may be unusually rich in certain amino acids, and this can lead to
their appearing to be more similar than they really are. As a result, two such
proteins will exhibit a spurious homology (or false positive). In order to better
discriminate homologies, we apply a jumbling strategy to correct for such spuri-
ousness (see for example Ref. 25). This is implemented as follows. After
sequence alignment, the two sequences being compared are repeatedly
randomized a given number of times3, until several jumbled sequences of each
are available. Then, each of the jumbled sequences of one is subjected to the
alignment procedure with each of the jumbled versions of the other. For
example, if each sequence is jumbled 5 times, then altogether 25 alignments of
1. Customizable via Tailor variable BIOPOLYMER GAP_PENALTY

2. Customizable via Tailor variable BIOPOLYMER SIMILARITY_MATRIX
3. Customizable via Tailor variable BIOPOLYMER NUMBER_JUMBLES

Protein Modeling
jumbled pairs are made. Their alignment scores are averaged, the mean obtained
(S), and the standard deviation (D) calculated. The score of the original
alignment (S0) is compared with the mean of the randomized sets (thus
separating signal from noise), and the difference is expressed:
S – S0
X = -------------- [EQ 2]
D
where
• S = mean of alignment scores
• S0 = score of the original alignment
• D = the standard deviation of the jumbled scores
• X = the significance score, a second filter (after the identity cutoff)
which is applied to filter false positives.
X provides a quantitative (and more reliable than sequence identity alone)

measure of the significance of the alignment, i.e., the higher the value of X, the
more significant the alignment. Based on the experience of workers in the field
(e.g. Ref. 30, Ref. 31) as well as our own, we propose that a significance score
(X) of 4.0 or higher is indicative of a sequence alignment that is viable for
purposes of model building by homology.
18.2.3 Protein Completion

SYBYL has the capability of constructing complete proteins from the Cα
coordinates.
Backbone Construction
This functionality enables you to generate plausible models for the backbone of
a protein or a polypeptide given only the coordinates for the α carbons [Ref. 35,
Ref. 36]. This command is useful in studying proteins for which only α carbon
coordinates have been deposited with the Protein Data Bank.
The method for backbone construction uses a 3-pass screen for finding each
fragment. First it measures the end-to-end distance of all fragments in the
database, saving those fragments whose distance is within a specified tolerance
from the reference fragment. Next, the retained fragments are screened by
comparing all inter-Cα distances within the fragment with the corresponding
distances in the reference fragment, and saving the M best fragments. Finally, it
performs a least-squares fit of each retained fragment onto the reference
fragment, and chooses the one with the lowest RMS deviation. If the RMS

Protein Modeling
deviation is below a threshold value and the fragment length is less than a
specified maximum, the fragment length is incremented by 1 and the procedure
is repeated.
To construct an entire chain, the procedure starts at the beginning of the chain
looking for a fragment of a given minimum length (e.g. 4 residues, the default
value). The length N of the actual fragment found may be anywhere between
this minimum length (4 in this example) and a specified maximum length. The
procedure in principle could then move down the chain by N residues and look
for the next fragment in the database. However, to avoid discontinuities at the
junction of two fragments, the method actually allows successive fragments to
overlap instead of advancing a full N residues down the chain. The number of
overlapping residues is determined by Tailor variable CONSTRUCT_BACKBONE
TRIM_C, which dictates the number of residues to trim off the end of one
fragment, and TRIM_N, which specifies how many residues to trim off the end
of the next fragment. Note that the sum of TRIM_C and TRIM_N must be less
than the minimum sequence length; otherwise the entire fragment could be
trimmed, and the procedure would never be able to advance down the
polypeptide chain.
The errors expected in the backbone construction (estimated by test runs on

proteins of known full three-dimensional structure) is on the order of 0.3 to 0.6
for the RMS deviation [Ref. 35, Ref. 36]
Sidechain Addition
A complementary operation can be used to model sidechains (see Add

Sidechains on page 125). However, while the backbone-generating procedure is
knowledge-based (see above), the sidechain-generating procedure uses a less
sophisticated approach in which only information on the most commonly
occurring (in the database of crystallized proteins) sidechain rotamer for each
residue type is actually used. Thus, while the errors expected in the backbone
construction are on the order of 0.3 to 0.6 RMS deviation [Ref. 35, Ref. 36], the
errors in sidechains (when the procedure is followed by a SCAN of all the
sidechain torsional angles) amount to about 2.45 (slightly greater than the
results obtained in [Ref. 3] using a knowledge-based sidechain-building
method).
Biopolymer End Group Modeling
SYBYL proteins are built from residue files which contain atoms appropriate
for chain continuation in either direction. For this reason the N terminal
nitrogen atom is given an amide atom type suitable for amide functional groups
in the interior of proteins. Real proteins, however, have either blocking groups
or charged N terminal residues.

Protein Modeling
In order to model chain ends correctly, SYBYL’s macromol dictionary contains

several blocking groups.
For proteins:
• The N terminus is capped by AMN (charged), AMI (neutral) or one of
the following blocking groups (see the Force Field Manual for partial
charges on N-terminal groups).
• ACE: N-acetyl
• PYR: N-pyroglutamyl
• FOR: N-formyl
• NMT: N-methyl
• BOC: N-t-butyloxycarbonyl
• The C terminus is capped by CXL (charged), CXC (neutral) or one of
the following blocking groups (see the Force Field Manual for partial
charges on C-terminal groups).
• NME: N-methyl amide
• AMD: amide
• NMM: N,N-dimethyl amide
• CME: methyl
• MES: methyl ester
• EES: ethyl ester
For DNA and RNA:

• The O5’ terminus is blocked by HB.
• The O3’ terminus is blocked by HE.

Protein Modeling
18.2.4 Protein Folding And Model Generation
Secondary Structure Prediction
The best way to build a protein whose structure is not known is to base it on a
homologous protein whose structure is known; i.e. by homology modeling (see
the FUGUE Manual and the ORCHESTRAR Manual). However, if there is no
known homolog, one method to determine the structure is by predicting the
regions of regular secondary structure and then adjusting the intervening loop
regions. One should be aware that the reliability of this procedure is much lower
than that of homology modeling, thus it should be used with extreme caution
and only as a last resource [Ref. 37-Ref. 41]. SYBYL provides a way to predict
the secondary structure of a protein. This command makes it possible to read a
file containing only the primary amino acid sequence (see page 298 for the
format of this file). The sequence can also be read from a molecule area. The
command then lists the primary sequence and the predicted conformation for
each residue (α-helix, β-sheet, coil).
SYBYL provides three methods for assigning conformation. All three methods
work by first studying a database of proteins of known structure, from which a
set of parameters is derived. These parameters are then used in a formalism
(series of equations) that enables approximation of probability or probability-
like estimates of the tendencies of given amino acid sequences to attain
particular secondary structures. These methods differ from each other in the
techniques used to extract the information present in the database.
SYBYL provides the following procedures:
MAXFIELD_SCHERAGA (Bayes Statistics) [Ref. 37]

The original methodology of Maxfield and Scheraga was extended to
consider interactions between residues separated by up to 8 amino acids, and
the parameters were rederived using a much larger (and recent) database of
crystallized proteins. The definitions of secondary structure were also
changed from dihedral angle-based to the definitions of Kabsch and Sander
[Ref. 42].
GARNIER_OSGUTHORPE_ROBSON (Information Theory) [Ref. 39]

A straightforward implementation of the procedure developed by the
original authors.
QIAN_SEJNOWSKI (Neural Networks) [Ref. 40]

An implementation of the zero-hidden-layer network described in the
Appendix of reference Ref. 42 (see Tables 13, 14, and 15).

Protein Modeling
• Ref. 41 for a review of secondary structure prediction methods
• Secondary Structure Prediction Files on page 298 for the format of the
input and output files
• Predict Secondary Structure on page 183 for a description of the
command syntax
Protein Loop Searching
The loop search facility of SYBYL (Protein Loop Search on page 193) enables
the use of fragments of proteins of known three-dimensional structure during
building of models of unknown protein structures. This functionality looks for
fragments of specified geometry in a protein fragment database constructed
from the Protein Data Bank. The specified geometry is given by distances and
coordinates involving the end residues of the loop or fragment.
Loop search is useful whenever insertions or deletions along the polypeptide

chain become necessary during applications of protein modeling. It can also be
used to your advantage when changes or mutations to a proline residue are
called for: if the original residue does not have a value of φ in the neighborhood
of -60 (the approximate value imposed by the ring constraint in proline), grossly
distorted geometry localized to the newly introduced proline residue will result.
LOOP SEARCH can eliminate this problem by adjusting the conformation of
neighboring residues to compensate for the proline substitution. For example,
consider the generic sequence A=B=C=D=E. where the goal is to replace
residue C with proline. This can be done by deleting the trimer B=C=D and
running a loop search for B=pro=D. If more extensive adjustments are
necessary the entire pentamer could be replaced with A=B=pro=D=E.
In any loop search the size of the fragment to model depends in part on the
secondary structure of the surrounding regions; in general it is wise to leave
elements of regular secondary structure as unperturbed as possible.
When applying loop search several regions are implicitly considered: a

framework region and an anchor region in the target protein, an anchor region
and a window region in each of the fragments retrieved from the database. The
framework region is not affected by application of the command. The anchor
regions are used to guide the fitting of the window region. The window region
corresponds to the group of residues actually being inserted. The anchor regions
of the target protein may have their geometry altered by application of LOOP
SEARCH (see below to find out when and how this is implemented).
For example, consider the target protein ABCDEFLMNOPQ. There may be any
number of additional residues between F and L, but as soon as EF and L are
chosen as the anchor region, those additional residues disappear from the final

Protein Modeling
protein model. When inserting the pentamer GHIJK between F and L, using EF
and L as the anchor region, the database is searched for fragments of the form
EFGHIJKL. Residues ABCD and MNOPQ represent the framework region, EF
and L the anchor region of the target protein, EF and L the anchor region of the
loop, and GHIJK the window region.
The approach used by SYBYL’s loop search is to find fragments in the database
of the proper residue length whose anchor regions have a good geometric fit to
the anchor regions of the modeled protein. Application of this procedure usually
generates several candidate loops that satisfy the geometrical requirements; i.e.
protein fragments of the specified length that close the gap in the polypeptide
chain while preserving nearly ideal covalent geometry. However, there are other
criteria, not explicitly used during the actual loop search, that can guide the
choice of one particular candidate loop over another. In order to invoke these
criteria, a LOOP ANALYSIS facility is available. This facility provides graphical
tools for analyzing and selecting from the retrieved fragments on the basis of
quality of fit to the anchor regions, sequence homology, steric interactions, and
other criteria.
To efficiently find the fragments with the best matching anchor regions,
SYBYL uses a 3-pass screen. First, it measures the end-to-end distance of all
fragments in the database of the proper number of residues and retains all whose
distance is within a given threshold of the corresponding distance in the target
protein. Second, it computes all distances between an α carbon in the N anchor
region and an α carbon in the C anchor region for each retained fragment. The
root mean square deviation of these inter-Cα distances is computed from the
corresponding distances in the reference protein, and those fragments whose
RMS distance deviation is less than a second threshold are retained. Third, it
performs a rigid body least-squares fit of the anchor regions in each retained
database fragment to the anchor regions of the target protein, and retains those
fragments whose least-squares fit RMS is below a third threshold. The values of
the various thresholds and other parameters of the search (see below), are
controlled by Tailor variable PROTEIN_LOOP.
When a retrieved fragment is inserted into the target molecule, the coordinates
of the fragment’s atoms are transformed according to the fragments least-
squares fit to the anchor regions. While this gives unambiguous coordinates for
the window region of the loop, a question remains: How to assign the coordi-
nates in the anchor regions of the loop to avoid bad discontinuities. SYBYL
offers two choices for adjusting the coordinates, MELD_ANCHOR and
TWEAK_LOOP, controlled by Tailor variable PROTEIN_LOOP ADJUST_COORDS.
The TWEAK_LOOP option leaves the anchor residues of the reference protein
unperturbed, and makes small adjustments to the torsion angles of the window
residues to achieve an exact overlap of the anchor coordinates. This option uses
the TWEAK algorithm discussed in the next section.

Protein Modeling
With the MELD_ANCHOR option, SYBYL uses a flexible weighting procedure to

blend the coordinates in the anchor regions; this procedure is controlled by two
TAILOR variables (LOW_LOOP_WEIGHT and HIGH_LOOP_WEIGHT) set by Tailor
subject PROTEIN_LOOP. An atom’s coordinates are a weighted average of the
coordinates in the target molecule and the coordinates in the database fragment.
The atomic coordinates of anchor region residues will be more heavily weighted
by database-fragment coordinates. For example, if LOW_LOOP_WEIGHT and
HIGH_LOOP_WEIGHT have values of 0.0 and 1.0, respectively, the coordinates
of the first residue of a 2-residue N-anchor region are calculated using
weighting coefficients of 1/3 for the database-fragment coordinates and 2/3 for
the target-molecule coordinates (see Equation 3 below). In this example, the
coordinates of the second residue are calculated with the reversed weighting
coefficients (2/3 for the database-fragment coordinates and 1/3 for the target-
molecule coordinates). To preserve the exact original coordinates in the anchor
regions of the target protein set LOW_LOOP_WEIGHT and HIGH_LOOP_WEIGHT
equal to 0.
The equation to assign the coordinates is:
xyz = (xyz_frag * frag_wt) + (xyz_target * target_wt) [EQ 3]
where:
• xyz_frag are coordinates from the database fragment
• frag_wt equals LOW_LOOP_WEIGHT + (fraction * weight_diff)
• xyz_target are coordinates from the original target protein
• target_wt equals 1 - frag_wt
• fraction equals res_dist / (na + 1)
• weight_diff equals HIGH_LOOP_WEIGHT - LOW_LOOP_WEIGHT
• res_dist is how far removed a given anchor residue is from r1 (must be
between 1 and na)
• r1 is the residue preceding the anchor region of the loop
• na is the number of anchor residues in this anchor region
Random Tweak Loop Generation
Tweak Conformational Loop Search is currently implemented to generate a

number of polypeptide backbones which will close gaps or rings. The command
is composed of two parts, a random function which generates random φ and ψ
dihedral angles for the polypeptide fragment (whose length you specify) and a
tweak function which minimally modifies the set of dihedral angles to meet
distance constraints. The distance constraints are the requirement that the
fragment cross a gap in a larger structure to join at the two ends or close a

Protein Modeling
macrocyclic ring. Finally the fragment (loop) generated by Tweak is subjected

to a series of filters which test for suitability in the model structure. The method
is based on that of Shenkin, et.al. [Ref. 43] and is discussed further in Fine,
et.al. [Ref. 20].
When using Tweak, several regions are implicitly considered: a framework

region and an anchor region in the target protein and an anchor region and a
window region in each of the fragments generated by the random tweak process.
The framework region will not be affected by the application of the command.
The anchor region, unlike LOOP SEARCH, consists of only a single residue at
each end of the gap. A second difference in comparison to LOOP SEARCH is that
no melding of the anchor regions occurs because the tweak portion of the
command can fit the anchors to any precision desired. The total number of
residues returned by Tweak is the number in the window region plus two (one
for each anchor region).
Tweak initially defines four distance constraints and their target values between
the Cα and N atoms of the N terminal anchor residue and the Cα and C atoms
of the C terminal anchor residue (see Figure 1).
Figure 1 Base geometry of a loop or the closure geometry of a ring. The

base geometry is defined by the four atoms, Cα, N, Cα and C, and the six
edges connecting them, d1 through d6. The last two are fixed by virtue of
being bond lengths. The remaining four are calculated from the gap in the
protein that the loop will fit.

Protein Modeling
Tweak uses a random number generated successively from a user definable seed
value contained in the SPL variable MNDL_TWEAK_SEED. A protein fragment of
the required number of user specified residues is constructed with random φ/ψ
angles taken from a uniform distribution (proline φ angles are not modified by
the command).
The set of distances for the anchors in the generated loop is measured and a
difference vector between the actual and target distance constraints is computed.
A matrix [D] containing the derivatives of each distance with respect to each
torsion angle is computed. A set of optimal corrections to the torsion angles is
calculated from a 4x4 linear system defined by the difference vector and the
derivative matrix:
∆d j
∆Θ i = [ D ] --------------------
T
- [EQ 4]
[ D ][ D]
where ∆Θ i is the change in the torsion angle i, ∆d j is the linear system of

equations which relate the θs to the distance dj (refer to Figure 1), and []
indicate matrices or vectors. The optimal corrections are limited in magnitude
by Tailor variable TWEAK MAX_TORSIONAL_CHANGE. The final torsional
corrections are applied to the fragment to give a new set of atomic coordinates.
This process repeats until either the number of iterations is exceeded
(MAX_ITERATIONS) and the fragment is rejected, or the magnitude of the
difference vector is less than the value set by Tailor variable TWEAK
TARGET_DISTANCE_TOLERANCE.
The loops generated by Tweak are now subjected to a series of tests to check
their suitability for inclusion in the protein model.
The distance constraints defined by Tweak measure some of the distances

between the vertices of a pseudo-tetrahedron (refer to Figure 1). Tetrahedra are
chiral in nature but the distance constraints so defined are not. This necessitates
a chirality check on the loop anchors. This is done by comparing triple cross
products of vectors in the tweaked loop with the original anchors in the model
structure. Loops failing this test are rejected. Note that this particular test yields
approximately a 50% rejection rate.
If Tailor variable TWEAK DO_BUMP_CHECK is set to YES a van der Waals bump
check is performed on the backbone atoms of the loop (see Tailor variable
GENERAL BUMPS_CONTACT_DISTANCE and BUMPS_NEIGHBOR_DISTANCE to
control the bump checking algorithm). This is an internal check for van der
Waals contacts between atoms within the loop generated by TWEAK and does not
include, for example, van der Waals checks between the loop and the original
model structure. Loops failing the optional bump check are rejected.

Protein Modeling
If a loop fragment has passed all screening tests it is accepted, fitted to the
original anchor region, and written to a .loop file for subsequent analysis (see
Loop Search Results in a Spreadsheet on page 201).
This entire method is repeated until the number of loop fragments generated
equals that specified by Tailor variable TWEAK NLOOPS. The resulting loop
fragments are passed into BIOPOOLYMER LOOP ANALYZE and you are
prompted for commands to analyze and select from the set of tweaked loops.
When the loop search is complete, SYBYL writes a file containing the param-
eters of the loop search and the loop fragments. For each loop fragment, the
following information is stored:
• the source of the fragment,
• amino acid sequence,
• RMS fit to anchor regions,
• coordinates of the backbone atoms in the loop, that are transformed to fit
the reference molecule.
Note that the source of the fragment is always TWEAK_XX (where XX is the
loop number, the amino acid sequence for all tweak generated loops is constant,
and the RMS fit to the anchor regions will always be very close to zero. Note
also that only coordinates of backbone atoms are produced. You can later add
sidechains (see Add Sidechains on page 125).
Tailor subject TWEAK affects the behavior of biopolymer tweak loop searches:
• MAX_ITERATIONS—The maximum number of iterations performed on a
set of initially random torsion angles. Loops that have not met the
distance constraints within this number of iterations are rejected and the
letter d is written to the terminal.
• MAX_TORSIONAL_CHANGE—The maximum change (in °) allowed per
torsion angle per iteration. A torsion only changes by ± this value per
iteration.
• TARGET_DISTANCE_TOLERANCE—The difference in length considered
significant between actual and target distance vectors. Actual and target
vectors whose length differ by less than this value are considered equal
in length.
• NLOOPS—The number of loops to generate per run
• DO_BUMP_CHECK—Whether loops with bad internal van der Waals
contacts are accepted in the final set of generated loops. See Tailor
variable GENERAL BUMPS_CONTACT_DISTANCE and
BUMPS_NEIGHBOR_DISTANCE to control the bump checking algorithm.

Protein Modeling
Protein Loop Analysis
The Protein Loop Analysis functionality provides you with the ability to choose
protein fragments for inclusion in a particular model structure based on a
number of criteria. In general no single measure of candidate loop suitability
will ever suffice for an automated selection of loops in a model structure. For
this reason you are offered a number of tools for display and analysis of the
candidate loops using SYBYL’s graphics capabilities.
The candidate loops found by any of the methods discussed above are written
into a .loop file. The Protein Loop Analysis functionality reads this file and
enters the candidate loops as rows in a spreadsheet. The column entries which
are automatically created for this table include source of the fragment, sequence
of the fragment, and RMS deviation of the least-squares fit to the anchor region.
You may introduce additional columns which (1) measure distances, angles, or
torsions within each of the loops, (2) check bumps within loops or between the
loop and the surrounding protein, (3) calculate RMS deviations from a reference
loop or (4) score the loops according to a variety of mutation criteria. Inter-
esting loops can be saved to a SYBYL database for further treatment using the
energy-based techniques discussed above.
18.2.5 Small Peptide Methodology

Many of the techniques used for studying protein structure can be used success-
fully for studying small polypeptides (here, small refers to polypeptides of up to
about 20 residues; although most of the discussion that follows applies to any
size polypeptide that can be synthesized by the peptide organic chemist).
However, one aspect of small peptide modeling (and especially design) that is
absent in the protein case is the ability to introduce amino acids other than the
20 naturally occurring ones into the sequence (see [Ref. 44] for a typical study
of small peptide structure and design).
For the purpose of incorporating new exotic residues, it is important to have a

convenient manner in define new residues (see Create or Modify a Monomer on
page 241). Using this functionality, it is possible to include virtually any
sidechain in an α amino acid residue, to consider residues with D stereochem-
istry (residues with D stereochemistry can be readily obtained from L-residues
by inverting the α carbon), or even to perform some more radical changes in the
backbone such as usage of N-methylated forms, or introduction of beta or
gamma amino acids. Another feature of synthetic peptides is the variety of end
or blocking groups possible on the N and C termini. Conformational studies
carried out on ribonuclease C-peptide analogs [Ref. 45] show the effect of end
groups on the structural behavior of small peptides. With SYBYL you can
create new blocks which can be added to a polypeptide chain (see Define a New
Blocking Group on page 252). SYBYL’s default dictionary, macromol, contains
a variety of these non-standard residues and blocking groups.

Protein Modeling
Although these modeling building tools go a long way towards the goal of more
realistic peptide modeling, the need for more quantitative results may require
the use of energy minimization or MD calculations on these systems. Thus,
force field parameters for the newly created residues are needed. This, in
general, represents no special problem if you are using the Tripos force field.
However, if you wish to use the Kollman potential energy function [Ref. 12,
Ref. 13] additional parameter determinations have to be carried out. To assist
with this, we have included a few non-protein amino acids in the macromol
dictionary, as well as some blocking groups whose electrostatic parameters have
been calculated by following the original work as closely as possible [Ref. 12,
Ref. 13, Ref. 46]. A detailed description of these calculations, which includes a
prescription of how such parameters could be derived by using tools available
within or from SYBYL, is included in the Force Field Manual.

Nucleic Acid Modeling
18.3 Nucleic Acid Modeling

Due to the complexity of biomacromolecular systems and the uncertainty of
many energy-based modeling tools available, it is crucial that optimal use be
made of the available database of structural knowledge (see [Ref. 14] for an
application of this philosophy to protein modeling). This can be approached by
a formulation of the modeling problem that makes use of crystallographic infor-
mation from the start. For example, modeling of double-helical structures of an
arbitrary DNA sequence may involve a perturbation or mutation strategy (see
Mutate Monomers on page 168 and Copy Conformation on page 192) on struc-
tures, which are known from X-ray or NMR studies, or from previously
obtained models. A relatively thorough strategy is thus possible in the DNA
case due to the relatively small number of nearly distinct double-helical struc-
tural families; more subtle sequence-dependent effects may be studied by
conducting MD or energy minimization calculations (see [Ref. 47] for a recent
survey of oligonucleotide structures).
18.3.1 Single Strand Nucleic Acids

Single stranded DNA and RNA may be built by using the functionality
described in Build Protein, DNA Strand, RNA Strand, Carbohydrate on page
142. Conformations may be set by defining standard states or by explicitly
selecting values for the conformational angles of these biopolymers. Files in the
standard PDB format may be read directly with the PDB command. This single-
stranded capability allows you to model ribosomal or transfer RNAs as well as
non-canonical DNA structures. Hairpin loop structures can be constructed by
combining the double-stranded building methods (see below) with the Build a
Biopolymer via the Menubar and Join Chains operations at the hairpin.
18.3.2 Nucleic Acid Double Helices

Two related features in SYBYL address the most basic aspects of nucleic acid
modeling.
1. Build a DNA Double Helix on page 151
Generates idealized, symmetric three-dimensional models of canonical
double-helical forms for arbitrary sequences of standard DNA bases, namely
A, T, G, and C. Currently supported are the A form [Ref. 48], B form [Ref.
48], C form [Ref. 49], A89 form, a recent revised version [Ref. 50] of the
classical A form, and the Z1 form [Ref. 51].
Usage of the command for the A, B and C forms is straightforward. For Z
helices one should be aware of the fact that the program always generates an
anti-syn (for rotation about the angle χ) structural sequence (i.e. all odd
numbered nucleotides have an anti structure, whereas the even numbered
nucleotides are syn). This occurs whether the base is a purine or pyrimidine.

Nucleic Acid Modeling
Thus, in order to get correct Z-DNA for a (dC-dG) oligomer, one should
enter the sequence as c=g=c=g= etc.; entering g=c=g=c results in reverse
conformations for guanine and cytosine, as observed experimentally for Z-
DNA [Ref. 45]. This feature, although demanding some extra care on your
part in simple situations, permits generation of Z-DNA double helices for
more exotic sequences (the structures can be energy-refined later). It is
possible to study distorted models of this kind of helix (see [Ref. 52] for an
application along these lines).
2. Build an RNA Double Helix on page 152
Generates double helical models [Ref. 53] of A and A’ (denoted AP in
SYBYL) of RNA molecules of arbitrary sequence.

Polysaccharide Modeling
18.4 Polysaccharide Modeling

SYBYL’s Biopolymer functionality includes a sugar dictionary for the purpose
of building polysaccharides. It contains only the framework of the most
common sugar residues (glucose, fructose, galactose, mannose, and ribose) in
both the a and b hemiacetal forms. From these residues, it is possible to build all
the epimers of glucose by using the INVERT command. In addition, residues
such as N-acetyl-glucosamine can easily be created by building the base residue
and performing the appropriate modifications. These modified residues can in
turn be added to your local copy of the sugar dictionary to make repeated opera-
tions more efficient. For a description of dictionary and residue files see
Biopolymer Dictionaries on page 278.
The manipulation of polysaccharides involves one critical difference in

comparison to other biopolymers: the possibility of branched structures. This
adds the need to specify the type of connection when building a carbohydrate
chain. Currently, the options are O1 to C4, O1 to C6, O1 to C2, and O1 to C3,
but additional connections can be conveniently added to the dictionary to give
2-3 or 1-1 connections. Tailor variable BIOPOLYMER ASSIGN_ATTACH_MODE
determines the frequency of prompting for a connection type: once per building
operation or once per residue, at your discretion.
Conformational φ and ψ torsion angles for polysaccharides are defined in terms

of the hydrogen attached to C1. For this reason it is necessary to set Tailor
variable BIOPOLYMER BUILD_HYDROGENS to ALL in order for the appropriate
conformational states to be defined. In addition, a unique name for the φ and ψ
angle exists for each type of connection to the residue (i.e., phi_14, phi_16, etc.)
and commands like BIOPOLYMER MEASURE may give anomalous output for
residues involved in branching unless you are careful in the selection of the
correct torsion angles.
Command mode is less useful for polysaccharides because of difficulty in

generating substructure numbers for multi-branched polysaccharides. In some
cases, two different substructures may have the same numbers, making access
to these substructures difficult. The best work-around at present is to select the
substructures directly from the screen.
One additional statement relates to connecting polysaccharide chains together.

O1 from one residue is always connected to a carbon on the next rather than the
oxygen atom of the corresponding hydroxyl group. That is, the O1 is preserved
while the hydroxyl group of the target residue is removed in Building and Join
Chains operations.
You can build glycoproteins and proteoglycans by creating the protein and the
polysaccharide pieces in separate molecule areas then adding a bond to connect
the appropriate atoms in the two structures. This allows minimization of the

Polysaccharide Modeling
combined structures with the Tripos force field. Other BIOPOLYMER commands
work only on the section of the molecule corresponding to the open dictionary.
If you have interest in using the Kollman force fields, contact Tripos for instruc-
tions on creating combined dictionaries.

Biopolymer Recommended Reading
18.5 Biopolymer Recommended Reading

[1] F. C. Bernstein, T. F. Koetzle, G. J. B. Williams, E. F. Meyer, M. D.
Brice, J. R. Rodgers, O. Kennard, T. Shimanouchi and M. Tasumi,
J. Mol. Biol., 1977, 112, 535 (1977)
[2] Gilbert, W. A., Users Guide to the Software System of the Protein
Identification Resource, National Biomedical Research Foundation,
Washington, D.C. (1986)
[3] G. E. Schulz and R. H. Schirmer, Principles of Protein Structure,
Springer-Verlag New York, NY (1979)
[4] W. Saenger, Principles of Nucleic Acid Structure, Springer-Verlag New
York, NY (1984)
[5] G. O. Aspinall, The Polysaccharides, Academic, New York, NY (1982)
[6] T. A. Jones, Methods Enzymol. 1985, 115, 157.
[7] G. Nemethy and H.A. Scheraga, Q. Rev. Biophys. 1977, 10, 239.
[8] A. Hagler, P. Stern, R. Sharon, J. Becker and F. Naider, J. Am. Chem.
Soc. 1979, 101, 6842.
[9] P. K. Weiner and P. A. Kollman, J. Comp. Chem., 2, 287.
[10] B. R. Brooks, R. E. Bruccoleri, B. D. Olafson, D. J. States, S.
Swaminathan and M. Karplus, J. Comp. Chem. 1983, 4, 187.
[11] IUPAC-IUB Commission on Biochemical Nomenclature, Biochemistry
1970, 9, 3471.
[12] S. J. Weiner, P. A. Kollman, D. A. Case, U. C. Singh, C. Ghio, G.
Alagona, S. Profeta and P. K. Weiner, J. Am. Chem. Soc. 1984, 106, 765.
[13] S. J. Weiner, P. A. Kollman, D. T. Nguyen and D. A. Case, J. Comp.
Chem. 1986, 7, 230.
[14] D. Hall and N. Pavitt, J. Comp. Chem. 1984, 5, 411.
[15] J. A. McCammon and S. C. Harvey, Dynamics of Proteins and Nucleic
Acids, Cambridge University Press. Cambridge, UK (1987)
[16] H. H. L. Shih, J. Brady and M. Karplus, Proc. Natl. Acad. Sci. USA
1985, 82, 1697.
[17] K. A. Palmer, H. A. Scheraga, J. F. Riordan and B. L. Vallee, Proc.
Natl. Acad. Sci. USA 1986, 83, 1965.
[18] J. Moult and M. N. G. James, Proteins 1986, 1, 146.
[19] R. E. Bruccoleri and M. Karplus, Biopolymers 1987, 26, 137.
[20] R. M. Fine, H. Wang, P. S. Shenkin, D. L. Yarmush and C. Levinthal,
Proteins 1986, 1, 342.

[21] T. Blundell, D. Carney, S. Gardner, F. Hayes, B. Howlin, T. Hubbard, J.

Overington, D. A. Singh, B. L. Sibanda and M. Sutcliffe, Eur. J.
Biochem. 1988, 172, 513.
[22] C. Chothia, A. M. Lesk, M. Levitt, A. G. Amit, R. A. Mariuzza, S.E.V.
Phillips and R.J. Poljak, Science 1986, 233, 755.
[23] T.A. Jones and T. Thirup, EMBO J. 1986, 5, 819.
[24] George, D., Hunt, L., & Barker, W. (1988). Current Methods in
Sequence Comparison and Analysis. In Macromolecular Sequencing and
Synthesis, Schlesinger, D. (editor), 12, 127-149.
[25] Doolittle, R. (1986). Of URFS and ORFS, A Primer on How to Analyze
Derived Amino Acid Sequences.
[26] Doolittle, R. (1990). Methods in Enzymology, 183. Molecular
Evolution: Computer Analysis of Protein and Nucleic Acid Sequences.
[27] Lipman, D. J. & Pearson, W. R., Science 1985, 227, 1435.
[28] Needleman, S. & Wunsch, C., J. Mol. Biol. 1970, 48, 443.
[29] Fredman, M., Bull. of Mathematical Biology 1984, 46, 553.
[30] Geoffrey J. Barton, Methods in Enzymology 1990, 183, 403-428,
Academic Press
[31] Steven Henikoff, Biocomputing in “Informatics and Genome Projects”,
Douglas W. Smith (ed.), Ch. 4, pp 87-117, Academic Press (1994)
[32] Greer, J., J. Mol. Biol. 1981, 153, 1027.
[33] McLachlan, A. D., J. Mol. Biol. 1972, 64, 417.
[34] Dayhoff, M. O., Schwartz, R. M., & Orcutt, B. C., Atlas of Protein
Sequence and Structure, 5, Suppl. 3, 345.
[35] Claessens, van Cutsem, Lasters & Wodak, Protein Engineering 1989, 2,
335.
[36] Reid & Thornton, Proteins 1989, 5, 170.
[37] F. R. Maxfield & H.A. Scheraga, Biochemistry 1976, 15, 5138.
[38] J. F. Gibrat, J. Garnier & B. Robson, J. Mol. Biol. 1987, 198, 425.
[39] J. Garnier, D. Osguthorpe, B. Robson, J.Mol.Biol. 1978, 120, 97.
[40] N. Qian and T. Sejnowski, J.Mol.Biol. 1988, 202, 865.
[41] G. Fasman, TIBS 1989, 14, 295.
[42] W. Kabsch & C. Sander, Biopolymers 1983, 22, 2577-2637.
[43] P. S. Shenkin, D. L. Yarmush, R. M. Fine, H. Want, and C. Levinthal,
Biopolymers 1987, 26, 2053.
[44] P. W. Schiller, Biophys. Chem. 1988, 31, 63.

[45] (a) K.R. Shoemaker, P.S. Kim, D.N. Brems, S. Marqusee, E.J. York,
I.M. Chaiken, J.M. Stewart and R.L. Baldwin, Proc. Natl. Acad. Sci.
USA 1985, 82, 2349.
(b) K.R. Shoemaker, P.S. Kim, E.J. York, J.M. Stewart and R.L.
Baldwin, Nature 1987, 326, 563.
(c) M. Rico, J. Santoro, F.J. Bermejo, J. Herranz, J.L. Nieto, E. Gallego
and M.A. Jimenez, Biopolymers 1986, 25, 1031.
(d) M. Vasquez and H.A. Scheraga, Biopolymers 1988, 27, 41.
[46] U. C. Singh and P. A. Kollman, J. Comp. Chem. 1984, 5, 129.
[47] R. E. Dickerson, J. Biomol. Struct. Dynam. 1987, 5, 557.
[48] S. Arnott and D. W. L. Hukins, Biochem Biophys. Res. Comm. 1972, 47,
1504-1509.
[49] S. Arnott and E. Selsing, J. Mol. Biol. 1975, 98, 265-269.
[50] R. Chandrasekaran, M. Wang, R. G. He, L. C. Puigjaner, M. A. Byler, R.
P. Millane and S. Arnott, J. Biomol. Struct. Dyn. 1989, 6, 1189-1202.
[51] A. H. J. Wang, G. J. Quigley, F. J. Kolpak, G. van der Marel, J. H. van
Boom, and A. Rich, Science 1981, 211, 171-176.
[52] S. Arnott, D. W. L. Hukins and S. D. Dover, Biochem Biophys. Res.
Comm. 1972, 48, 1392-1399.
[53] B. Hartmann, B. Malfoy and R. Lavery, J. Mol. Biol. 1989, 207, 433-
444.
[54] Wang J., Cieplak P., Kollman A., “How Well Does a Restrained
Electrostatic Potential (RESP) Model Perform in Calculating
Conformational Energies of Organic and Biological Molecules?”
J. Comp. Chem. 2000, 21, 1049-1074.
[55] Cieplak P., Caldwell. J., Kollman P., “Molecular Mechanical Models for
Organic and Biological Systems Going Beyond the Atom Centered Two
Body Additive Approximation: Aqueous Solution Free Energies of
Methanol and N-Methyl Acetamide, Nucleic Acid Base and Amide
Hydrogen Bonding and Chloroform/Water Partition Coefficient of the
Nucleic Acid Bases.” J. Comp. Chem., 2001, 22, 1048-1057.

Biopolymer Index
A end group modeling 317

EXCISE 171
Alignment FASTA 67
biopolymer sequences 208 FIND CONFORMATION 177
structures by homology 218 FIT 216
Alpha-carbon trace 81 FIX_ASN_GLN 128
FIX_END_GROUPS 112
AMBER
FIX_MOLECULE 137
assign atom types 117
FIX_PROLINE 130
load atomic charges 104
FIX_SIDECHAINS 191
types and charges in cofactors 155
INSERT 169
Amino acid JOIN 157
non standard 326 LABEL_ATOMS 89
Angle LOAD 105
column type 204 LOOP 194
ANALYZE
Atom types
ADD_COLUMNS 204
cofactors 113
COLOR_LOOP 203
DISPLAY_LOOPS 203
B EXIT 206
LIST 205
Backbone SAVE 205
construction 149 SELECT_LOOP 204
display 81 SETUP 196
Bibliography MEASURE 174
biopolymers 225, 332 MULT_ALIGN_SEQ 209
Protein Data Bank 228 PDB 58
rendering 85 PHOSPHORYLATE 159
rotamer libraries 187 PIR 67
Binary protein database 255 POLY_BLOCK 110
PREDICT_SECONDARY 183
BIOPOLYMER 303 protein search 227
ADD MONOMER 248 RAMACHANDRAN 91
ADD_SIDECHAINS 125 RANDBUILD 160
ADDH 100 REMOVE 172
ALIGN_SEQUENCES 210 RENUMBER 133
ALIGN_STRUCTURES 220 REPLACE 167
ASSIGN_SEC_STR 181 RESIDUE_FIT 223
BLOCK 111 RIBBON 88
BREAK 156 RNAHELIX 152
BUILD 142 SEQUENCE 213
CAP 111 SET CHAINNAME 132
CHANGE 168 SET CONFORMATION 175
CHECK_GEOMETRY 135 SMALL_TO_POLY 138
CONSTRUCT_BACKBONE 149 TWEAK 198
COPY_CONFORMATION 192 TYPE_COFACTOR 113
CREATE BLOCK 252
CREATE DICTIONARY 239, 248 Biopolymer
CREATE MONOMER 241 adding hydrogens 99
CYCLE 158 aligning structures by homology 218
DICTIONARY 236 analyze loops 201
DISULFIDE 148 color 202
DNAHELIX 151 display all loops 202
backbone construction

theory 316
bibliography 332
C
binary protein database 256 Cap atoms 111
build dialog box 141 adding to terminal residues 108
chemical groups 292 Carbohydrate
cofactor database 292 building 142
cofactors 113, 292
Chain
commands 303
renaming in biopolymers 132
deleting residues 171, 172
trace in biopolymers 81
dictionary
close 237 Charge
file format 278 loading in biopolymers 104
list 237 Cofactor
open 236 adding to biopolymer 153
dictionary, monomer file format 278 database 292
display 71 SYBYL atom types 113
backbone 81
Color
protein view 72
protein loop 203
trace 81
find and fit fixed regions 223 Command
fixing amide chains 128 standalone utilities
hydrogen, essential 99 protein database 256
inserting residues 169 Conformation
ligand changing in biopolymers 175, 185
database 292 copying in biopolymers 192
loading charges 104 finding in biopolymers 177
loop search setup 196 rotamer libraries 185
loop search theory 320
Conformational angle 286
menu in a spreadsheet 201
mkprodat standalone utility 258 Conformational states 286
mutating residues 168 Creating
phosphate caps 159 biopolymers 20, 141, 142
PRODAT protein database 255
Cyclic peptide 158
protein preparation tool 94
replacing residues 167
RMSD local fitting dialog box 221 D
rotamer libraries 185
secondary structure Database
finding and rendering 177 local fitting of conformers 221
prediction 319 Dictionary
sequence alignment 208 add monomer 248
spreadsheet of loop results 201 biopolymer 236
structure alignment 218 close 237
terminal residues 108 create blocking group 252
theory 307 create dictionary 239, 248
Blocking group 326 create monomer 241
adding to terminal residues 108 defining a new monomer 241
creating 252 tutorial 41
removing 172 files 278, 286
saving in dictionary 239, 248 list 237
manage custom dictionary 238
Building open 236
biopolymers 20, 141, 142
Distance

column type 204 G
Disulfide bridge 148
Gap penalty 315
DNA
building 142
modeling H
building a double helix 151, 328 Helix
building single strands 328 DNA building 151
phosphorylation 159 RNA building 152
Double helix 328 Homologous proteins 309
DNA building 151
RNA building 152 Homology
matrix 315
Dunbrack rotamer library 185, 189 sequence alignment 208
Dynamics 139 structure alignment 218
Hydrogens
E adding in biopolymers 99
essential in biopolymers 99
End group modeling 317
Energy
Biopolymers 139
I
Essential hydrogens 99 Inhibitor-substrate docking 7, 308
INTER_LOOP_RMS
column type 204
F
FASTA file
format 301
J
reading and writing 67 Join
File formats biopolymer chains 157
.loop file 294 JOY
.pir file 301 color key 274
Biopolymer files 294
FASTA file 301
loop modeling 294 L
protein homology matrix 293 License requirements
Files created Biopolymer 8
BIOPOLYMER Ligand Depot 292
LOOP 194
List
LOOP ANALYZE 205
protein loop analysis 205
PREDICT 184
biopolymer trace 81 Loop modeling 309
file format 294
Fitting
theory 320
biopolymers 216
conformers to the average 221 Loop searching in biopolymers 194, 196
Fixing Lovell rotamer library 185
amide chains 128
prolines 130
sidechains 191
M
terminal residues 112 Measure
in biopolymers 174
Minimize

biopolymer structures 139 PHI plot, column in table 204
mkprodat standalone utility 256, 258 Phosphate caps 159
Mode PIR file
BIOPOLYMER 303 format 301
Monomer reading and writing 67
adding to a dictionary 248 Polysaccharide modeling 7, 308, 330
definition of new monomer in dictionary 241 Prediction
tutorial 41 protein secondary structure 183
inserting in a chain 169 theory 319
saving in a dictionary 239, 248 user-specified method 298
Mutation PRODAT
in biopolymers 168 loop search 196
protein database 255
N Protein Data Bank 332
binary file 255
National Biomedical Research Foundation
reading and writing files 58
address 67
Protein database 256
Nucleic acid
binary 255
double helices 328
adding structure 255
Nucleotides 7, 308
Protein modeling 309
adding
O hydrogens 99
sidechains 125
OMEGA plot, column in table 204 alignment
ORCHESTRAR by homology 218
sequence viewer 262 atom information 89
backbone construction 149
bibliography 332
P blocking group 252
PDB 58 building 142
activities upon reading a file 59 dialog box 141
activities upon writing a file 63 checking geometry 135
atom name conversion 134 cyclic peptide 158
binary protein file 255 database search 227
handling of water molecules 61 deleting residues 171, 172
reader 58 deletion 309
retrieve from PRODAT 64 display 71, 81
retrieve from RCSB site 64 view setup 72
see Protein Data Bank disulfide bridge 148
writer 63 editing
database 255
pdbfname 260
your own database 255
Peptides 7, 308 energy calculation 139
building 141, 142 energy-based modeling 309
minimize 139 fixing
small 326 amide chains 128
tutorials end groups 112
building 20 proline geometry 130
monomer definition 41 sidechains 191
prepare protein 10 homologous sequences 309
protein loop search 32 homology matrix files 293

hydrogens 99 biopolymer 88
inserting residues 169 Ring closure 286
loading charges 104
local RMS fitting RMSD
dialog box 221 local fitting of conformers 221
loop RNA
region 309 building 142
search 194, 196 modeling
theory 320 building a double helix 152, 328
tweak 198 building single strands 328
minimize 139 Rotamer libraries in biopolymers 185
mkprodat utility 258 Dunbrack 189
modifying the sequence 167, 168 file format 188
protein preparation tool 94
protein search 227
rendering 84 S
ribbons 84
Saving
rotamer libraries 185
molecules
secondary structure
protein loops 205
assigning 181
finding and rendering 177 Search
prediction 298, 319 biopolymers 227
sequence Secondary structure
alignment 208 assigning in proteins 181
solvents and cofactors 153 finding in biopolymers 177
terminal residues 108 prediction 319
tutorials 9 user-specified method 298
view setup 72
Sequence
PSI plot, column in table 204 alignment 208
in biopolymers 213
renumbering 133
R sequence homology column type 204
Ramachandran graph 91 Sequence viewer 261
Random sequence colors for text background 270
in biopolymers 160 description 262
RCSB JOY color key 274
retrieving file from site 64 mouse and keyboard interactions 275
preferences 269
References tailor 269
biopolymers 225, 332
Protein Data Bank 228 Sidechain
rendering 85 adding 125
rotamer libraries 187 Single strand nucleic acids 328
Renumber Solvent
in biopolymers 133 biopolymer 153
Residue Standalone utilities
adding to a dictionary 248 mkprodat 256, 258
definition of new monomer in dictionary 241 Substrate-inhibitor docking 7, 308
inserting in a chain 169
saving in a dictionary 239, 248
Residue, see Monomer T
Ribbon Tailor

PDB 63
sequence viewer 269
Terminal residues
editing 108
Torsion
column type 204
Trace, biopolymer chain 81
Tutorials
biopolymer building 20
monomer definition 41
peptide building 20
prepare protein 10
protein loop search 32
V
van der Waals
contact column type 205

Biopolymer Manual

Uploaded by

Copyright:

Available Formats

You might also like

Biopolymer Manual

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Biopolymer Manual

Uploaded by

Copyright:

Available Formats

Biopolymer Manual

1699 South Hanley Rd. Phone: +1.314.647.1099

3. Biopolymer Menu Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4. Read and Write Biopolymer Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

6. Prepare Biopolymer Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

SYBYL-X 2.1 Biopolymer 3

7. Build Biopolymer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .141

8. Biopolymer Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .161

9. Biopolymer Conformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .173

10. Protein Loop Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .193

4 Biopolymer SYBYL-X 2.1

11. Compare Biopolymer Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207

12. Compare Biopolymer Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215

13. Search Protein Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227

14. Biopolymer Dictionary & Database Administration . . . . . . . . . . . . . . . 235

15. The Sequence Viewer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261

16. Biopolymer Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277

17. Biopolymer Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303

18. Biopolymer Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307

SYBYL-X 2.1 Biopolymer 5

6 Biopolymer SYBYL-X 2.1

SYBYL/Biopolymer provides a flexible environment for the display and manip-

Biopolymer modeling uses the concept of a residue. In addition to treating

For the modeler trying to probe structure-function relationships in biopolymers

SYBYL-X 2.1 Biopolymer 7

1.1 What is New with Biopolymer

Protein View Enhanced

1.2 License Requirements for Biopolymer

SYBYL-X introduced a simplified licensing scheme in which the “SYBYL”

Additional features accessible only from the Biopolymer menu require

8 Biopolymer SYBYL-X 2.1

Explore SYBYL’s biopolymer functionality:

See License Requirements for Biopolymer on page 8.

SYBYL-X 2.1 Biopolymer 9

2.1 Protein Preparation Tutorial

The structure used in this tutorial is an oxido-reductase called dihydrofolate

A Matter of Time: This tutorial requires about 15 minutes of personal time.

2.1.1 Retrieve the Structure

! > Delete Everything

! Click to reset all rotations and translations.

2. Retrieve the structure of interest, 3dfr, from the RCSB.

The Retrieve PDB dialog opens (dialog description on page 64).

! Set the dialog as follows:

10 Biopolymer SYBYL-X 2.1

3. Review the information provided in the console by the PDB reader.

These lines refer to creation of individual substructure sets based on information

! Look at the “NOTE” lines. These provide important information.

These lines report the following:

5. Label the ligand and cofactor.

! Click in the toolbar.

! In the Atom Expression dialog, expand the Other Substructures list,

SYBYL-X 2.1 Biopolymer 11

! Click and select Atoms > Substructure.

2.1.2 Analyze and Prepare the Protein

! Biopolymer > Prepare Structure > Structure Preparation Tool

7. Analyze the protein’s structure.

! Press Analyze Selected Structure.

12 Biopolymer SYBYL-X 2.1