Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 10

MOLECULAR MODELING AND DRUG DESIGN

Learning objectives

 Introduction to molecular modeling


 Molecular graphics
 Basics of computational quantum mechanics
 Density functional theory (DFT)
 Empirical Force field model
 Exploring the energy surface
 Simulation methods using computers
 In silico molecular dynamics simulation methods
 QSAR

27.1. Introduction
Molecular Modeling deals with the ways to mimic the behaviours of molecules and molecular
systems. It is invariably associated with computer modeling and has been revolutionized by
computational techniques to the extent that most of the calculations could not be performed without the
use of a computer, and thus computers have extended the range of models that can be considered and the
systems to which they can be applied. The CPK model (Corey, Pauling, Koltum) is the space filling
model used by most of the chemists to enable three dimensional representations of the structures of
molecules to be constructed. These are useful both in teaching and in research. Molecular modeling is
also concerned with quantum mechanics for which foundations were laid many years before by the advent
of computers. Computational chemistry is very much involved in molecular modeling, wherein quantum
mechanics, molecular mechanics, energy minimizations, simulations, conformational analysis and other
computer related methods for understanding and predicting the behavior of molecular systems have their
roles. Computer graphics is also very much associated with molecular modeling. Bioinformatics uses
molecular informatics and chemoinformatics which deal with manipulation of information about
molecules, by retrieving the stored information in the computer. The chemists are now relieved of the
burden of synthesizing thousands of compounds to obtain the lead molecules. They have to now use just
screening methods in bioinformatics tools to synthesize potential leads after ruling out thousands of
compounds when they do not satisfy the rules in modeling and docking. Molecular informatics has also
gained substantial amount of information through automated sequencing machines being used to
determine the human genome. In molecular modeling , Born-Oppenheimer approximation enabling the
separation of electronic and nuclear motions is used. Changes in the energy of a system cause movements
on a multi dimensional surface called the energy surface.

27.2. Molecular Graphics

The interaction between molecular graphics and the underlying theoretical methods has enhanced
the accessibility of molecular modeling methods to assist in the analysis and interpretation of these
calculations and this is known as computer graphics. Raster devices have superceded vector devices in
molecular modeling. CPK models are generally used in molecular graphics. Using depth queuing, the
impression of three dimensional structure of an object can be felt in the two dimensional computer screen.
The computer graphics program provides facilities like translation, rotation and zooming towards or away
from the viewers of the models. Proteins are commonly represented using ribbon models. Non-covalent
interactions between two or more molecules are often facilitated by the van der Waal’s surface of the
molecule. A solvent accessible surface can also be represented. The formulae of Collnoly (1983 a, b) and
Richmond (1984) have been used for calculating the exact or approximate values of the surface area.
Many software programs are used by molecular modellers starting from a single task program to highly
complex packages. ab-intio quantum mechanical calculations can be carried out using the Gaussian
programs, the semi-empirical quantum mechanics calculations using MOPAC/AMPAC programs and the
molecular mechanics calculation using MM2 programs. Reviews in computational chemistry edited by
Lipcowitz and Boyd (1990), Encyclopedia of computational chemistry by Schleyer et al (1998) and
Molecular Modelling by Leach (1996) are very good source books in molecular modeling. Basic
knowledge of mathematical concepts such as vectors, matrices, differential equations and complex
numbers are required for a quick understanding of molecular modeling techniques.

27.3. Basics of computational quantum mechanics

Since quantum mechanics explicitly represents the electrons in calculations and also to derive the
properties that depend upon the electronic distribution leading to chemical reactions, these methods were
implemented using computers and computational quantum mechanics was used by chemists. Hückel
theory, valence bond theory and density theory are all incorporated in the computational quantum
mechanics, and the computers do a lot of work in these areas. Hartree-Fock equations can now be applied
to molecular systems using computational methods. Molecular properties can also be calculated using ab
initio quantum mechanics. Using the partitioning of electron density, bond order, electrostatic potential
and thermodynamics, structural properties can be studied using computational methods. By incorporating
parameters derived from experimental data, some approximate methods can now calculate certain
properties more accurately than the heighest level of ab intio methods.

27.4. Density functional theory (DFT)

DFT is an approach to the electronic structure of atoms and molecules. While Hartree - Fock
theory calculates the full electron wave function, DFT calculates the total electronic energy and overall
electronic distribution, as there is a relationship between these two energies. DFT methods using the
radiant-correction can give results for a wide variety of properties that are in some cases superior to, ab
inito calculations. For the optimization of molecular geometries, DFT methods are applied. There are
many kinds of problems that can be tackled using computational methods.

27.5. Empirical Force field model: Molecular mechanics (MM)

Quantum mechanical methods cannot tackle problems of too large dimensions whereas molecular
modeling can be used to tackle these problems. The force field being used in molecular mechanics can
provide answers which are as accurate as even the highest- level quantum mechanical calculation in a
fraction of computer time. The only limitation of molecular mechanics is that it cannot provide properties
that depend upon the electron distribution of a molecule. This is because molecular mechanics ignores the
electronic motions and calculates energy of a system based on nuclear positions only. The success of MM
calculations on the systems containing significant number of atoms and its working are due to the validity
of several assumptions like Born-Oppenheimer approximation. The parameters developed from data of
small molecules in the force fields can be used to study much larger molecules such as polymers also.
MM force fields include contributions from bond stretching, angle bending and torsional terms. Improper
torsions and out-of-plane bending motions have also been incorporated and also the cross terms in the
force fields reflecting coupling between the internal coordinates. The non-bonded terms in a force field
comprising electrostatic interactions and van der Waals interaction are also included in the force fields.
Efficient rapid methods are used for calculating atomic charges and distributed muti-poles are used to
represent the anisotropy of a molecular charge distribution. Aromatic interactions, polarization effect and
solvent dielectric models are incorporated. In van der Waals interactions, the dispersive interactions and
exchange forces are included. While modeling van der Waals interaction, Lennard-Jones 12-6 function is
used. For handling many body terms in the potential, efficient pair potential formula is used. The GRID
(Goodford, 1985) is used for finding energetically favorable regions in protein binding sites by using a
direction-dependent 6-4 function in hydrogen bonding calculation. MM force fields are introduced for the
simulation of liquid waters. To overcome the minor drawbacks of united atom force fields used in the
calculations on proteins, Toxvaerd anisotropic potential (1990) is being used in present times in MM.
MM can be used to calculate thermodynamic properties like heat of formation, which are comparable
with experimental results. Strange energy can also be calculated using MM programs along with steric
energy. MM force fields may be used to determine a variety of structure related properties and
parameterization data should be chosen accordingly. The OPLS (optimized parameters for liquid
simulation- Jorgensen & Tirado-Reeves) parameters are obtained to reproduce thermodynamic properties
using computer simulation techniques. Quantum mechanical calculations are increasingly used for the
parameterization of MM force fields. Some scientists use AMBER (Assisted Model Building with Energy
Refinement) force field for the calculations. AMBER force field is designed for calculations on proteins
and nucleic acids. A good general force field can often perform out a poor specific force field. There are
special force fields for inorganic molecules, solid state systems and for metals and semiconductors.

27.6. Exploring the energy surface - energy minimization

The way in which the energy varies with the coordinates is often called as potential energy
surface or the hyper surface. For a system with N atoms, the energy is a function of 3N-6 internal or 3N
cartesian coordinates. In molecular modeling, one is interested in minimum points on the energy surface.
When there are a very large number of minima on the energy surface, the minimum with the very lowest
energy is known as global energy minimum. Minimization algorithm is used to identify those geometries
of the system that correspond to minimum points. The highest point on the pathway between two minima
is known as the saddle point, where the arrangements of the atoms are in the transition structure. Both
minima and saddle points are stationary points on the energy surface, where the first derivative of the
energy function is zero with respect to all the coordinates. Minimization algorithms can be classified into
two groups

i) those which use derivatives of the energy with respect to co-ordinates


ii) and those which do not

The first one of these is very useful. There are many factors to be taken care of when choosing the
most appropriate algorithm or a combination of algorithm for a given problem and the ideal minimization
algorithm is the one that provides the answer as quickly as possible with the least amount of memory.
Most minimization algorithms can only go downhill on the energy surface and hence they can only locate
the minimum that is nearest to the starting point. The input to a minimization program consists of a set of
initial coordinates for the system coming from a variety of sources. If it is coming from x-ray
crystallographic study, the behavior of that protein in water can be studied by immersing it in a “solvent
bath”, where the coordinates of the solvent molecules will be obtained from a Monte-Carlo or molecular
dynamics simulations. One uses the simplest method for the energy minimization when the initial
configuration of the system is very high in energy as this takes lot of computer time; it is often used in
combination with a different minimization algorithms. Initially, for refining the initial structure, a few
steps of the simplest methods are used and then more complex steps can take over. Two first-order
minimization algorithms that are frequently used in the molecular modeling are the methods of steepest
descent and the conjugate gradient methods. Both these methods gradually change the coordinates of the
atoms as they move the system closer and closer to the minimum point. The Conjugate Gradient method
produces the set of directions which do not show the oscillated behavior of the steepest descent in narrow
valleys. While in the steepest descent method, both the gradients and the direction of successive steps are
orthogonal, the gradients at each point are orthogonal but the directions are conjugates in case of
conjugate gradients. A set of conjugate directions has the property that for a quadratic function of M
variables, M steps are sufficient to reach the minimum. The Newton-Raphson method is a second
derivative method to locate the minimum. The choice of minimization algorithm depends on many factors
like the storage and computational requirements, the relative speeds with which the calculations can be
performed, the availability of analytical derivatives and the robustness of the method. For MM
calculations of a small molecule, the Newton-Raphson method may be used provided another robust
method such as the simplex or steepest descents is applied to perform a few steps of minimization.
Docking results show that the steepest descent method can actually be superior to conjugate gradient
when the starting structure is some way from the minimum. However, conjugate gradients is much better
once the initial strain has been removed. For terminating the minimization calculations once the value is
close to the minimum there should be some means to decide. The simple strategy is to monitor the energy
from one iteration to the next and to stop when the difference in energy between successive steps falls
below a threshold. Alternate method is to monitor the change in coordinates and to stop when the
difference between successive configuration is sufficiently small. The last method is to calculate the root-
mean-square-gradient which is obtained by adding the squares of the gradients of the energy with respect
to the coordinates, dividing by the number of coordinates and taking the square roots.

27.7. Applications of energy minimizations

Energy minimization is an integral part of techniques such as conformational search procedures,


and is very widely used in molecular modeling. Energy minimization is also used to prepare a system for
other type of calculations. The techniques that are specifically associated with energy minimization
methods are normal mode analysis and the study of intermolecular processes.
27.8. Simulation methods using computers

Computer simulation methods enable us to study the behavior of macromolecules which have
many closely separated minima and also to study complex processes to predict their properties through
the use of the techniques that consider small replications of the macroscopic systems with manageable
numbers of atoms or molecules. Simulation techniques also enable the time-dependent behavior of atomic
and molecular systems to be determined and provide a detailed picture of the way in which a system
changes from one conformation or configuration to another. In the determination of protein structures
from X-ray crystallography, simulation techniques are widely used.

Time average is the value one measures experimentally as an average over the time of the
measurement.

27.9. In silico molecular dynamics simulation methods


In Monte Carlo (MC) method, random configurations are generated and special set of criteria are
used for deciding whether to accept each new configuration or not. This is based on the probability of
obtaining a given configuration which is equal to its Boltzman factor and also depends on the potential
energy function. States with low energy are generated with higher probability than configurations with
higher energy. In some cases, new configurations may also be obtained by moving several atoms, or
molecules, or rotating about one or more bonds. If energy of the new configuration is lower than the
energy of the predecessor then only the new configuration is accepted. The MD (Molecular Dynamics)
and MC simulation methods differ in many ways. The main difference is that MD provides information
about the time dependence of the properties of the system whereas there is no temporal relationship
between successive MC configurations. The outcome of each trial move in a MC simulation depends only
on its immediate predecessor, whereas in MD it is possible to predict the configuration of the system at
any time in the future or at any time in the past. MD has the kinetic energy contribution to the total energy
whereas in MC simulation, the total energy is determined directly from the potential energy function.
MD is at constant NVE ensemble (N is the constant number of particles, V is the volume and E is the
energy), whereas the traditional MC simulation samples from the canonical ensemble (NVT ensemble,
where T is the temperature). Simulation methods predict the thermodynamic properties of system for
which there is no experimental data, or system for which the experimental data is difficult or impossible
to obtain. Simulations can also provide structural information about the conformational changes in
molecules and the distribution of molecules in a system. Energy, heat capacity and pressure can be
calculated in a canonical ensemble (MC simulation) when the total temperature is constant. In the
microcanonical ensemble (MD simulation), the temperature will fluctuate.
For describing the structure of a liquid system, radial distribution functions are used and
thermodynamic properties can be calculated. Computer simulation uses phase space and for a system
containing N atoms, 3N positions and 3N momenta define a point in the 6N dimensional phase space and
an ensemble is a collection of points in this phase space. The mechanical properties such as the internal
energy, the pressure and heat capacity can be routinely obtained from MC or MD simulations. These
simulations preferentially sample the low energy regions of phase space. Coming to the practical aspects
of computer simulation, one has to choose the initial configuration close to the state which is desired to be
stimulated. Boundaries and boundary effects are crucial to simulation methods. In MD simulation, Verlet
algorithm uses the positions and accelerations at time t and the position from the previous step, r(t-t), to
calculate new positions, r(t+t), and also the velocity. MD is traditionally performed in the constant NVE
or NVEP ensemble. Solvent effects can also be incorporated in MD simulation by means of potential of
mean force (PMF). Conformational changes can usually be visualized from MD simulations. Lipid
bilayers can be simulated due to their biological importance in MD simulations. Once protein structures
are predicted from sequences, say, using Modeller program, this model should be subjected to MD
simulation and depending on the problems, simulation in water medium or lipid medium is essential.
Then only docking could be carried out with this model. Each system takes its own time for getting
stabilized during the simulation. Majority of the systems stabilize between 20 to 50 ns.

27.10. Molecular Docking


For doing docking calculations, free softwares like Autodock is available. Commercial softwares
like GOLD (IUCr), GLIDE (Schrodinger), etc., are also available. For the macromolecular structures,
hydrogen atoms are not located in X-ray crystallographic structure determination but hundreds of
hydrogen atoms exist in protein structure and hence the commercial softwares add these hydrogen atoms
which is known as protein preparation. Like that, ligands are also prepared.

Protein / Ligand preparation modules avoid steric clashes between the atoms. If the active site of
a macromolecule is already known, docking could be straight away done at the active site of the protein.
If not, several modules are available to predict the active site of the target protein. Before carrying out the
docking studies, the target and the ligand molecules should be energy minimized separately. Upon the
binding of a ligand at the active site, conformational changes may occur at the site of binding and hence
the amino acid residues at the active site pocket have to be flexible but if the flexibility has to be
considered, then, there are several rotatable bonds for which energy calculations have to be carried out
while a ligand binds at the site and many poses of the ligand molecules have to be considered. This
involves lot of computation time. Hence, initially, flexibility is considered only for the ligand molecule
and the receptor is considered to be rigid. Using the criteria from the scores and energy values, selection
can be made regarding the binding of ligands from many to few. Thereafter the receptor can also be
considered flexible along with that of the ligand and this is known as induced fit docking. This concept
has worked out well in docking studies as the theoretical results match about 90% to the experimental
results. Depending upon the size of the macromolecular target, the commercial software takes about few
hours to do the induced fit docking compared to nearly an hour to do rigid docking while screening
hundreds of compounds. Best ligand with good inhibition can be selected by analyzing the score, energy
and active site hydrogen bonds and hydrophobic interactions. For the initial filtering of the ligand,
Lipinski's rule of five can be used, namely,
1) The molecular weight of the ligand molecule cannot be greater than 500 dalton
2) There cannot be more than 5 hydrogen bond donors
3) There cannot be more than 10 hydrogen bond acceptors
4) Log P should not be greater than 5 (P is the partition coefficient between 1-octonal and water) or less
than -2.
Most of the prevailing drugs obey Lipinski's rule of five to the highest percentage.

27.11. QSAR
QSAR relates numerical properties of the molecular structure to its activity by a mathematical
model. Sometimes, the term QSPR (quantitative structure - property relationship) is also used. The in vivo
activity of a molecule depends on many factors and the structure - activity study can help to decide which
features of a molecule give rise to its overall activity. This may help to make modified compounds with
enhanced activity. Hansch (1969), developed equations which relate the biological activity to electronic
characteristics and hydrophobicity of the molecule. If C denotes the concentration of the compound to
produce the standard response in a given time, log P (logarithm of partition coefficient) of the compound
between 1-octanol and water (chosen by Hansch) as a suitable measure of relative hydrophobicity,  is
the Hammett -substitutent parameter and k1 - k4 are constants, then the QSAR equation is
log (1 / C) = k1 log P - k2 (log P)2 + k3  + k4. (27.1)
The property chosen for the inclusion in the QSAR equation, should be as uncorrelated with each other as
possible. Many programs are now available for doing QSAR and 3D QSAR is also possible now. QSAR
are often interpreted in terms of specific interactions in the macromolecular target. In a number of cases,
the crystal structure of the receptor - ligand complex was subsequently determined and so it has been
possible to use computer molecular graphics to discover whether the parameters in the QSAR equation
have any real meaning.
Summary
 The basics of molecular modeling are explained in detail in this module.
 Advancements in computer graphics which is an important advancement in visualization of the
space filing model to portray a 3-dimensioanl image are highlighted in this module.
 DFT calculation used to calculate total electronic energy is explained and compared with other
energy approximation methods.
 MM, QMM, OPLS and other methods are given with their uses and applications in different
molecular modeling methods.
 Induced fit docking and the commercial and academically available packages are explained along
with QSAR which play an important role in drug designing.

You might also like