Professional Documents
Culture Documents
Molecular Similarity & Molecular Descriptors For Drug Design
Molecular Similarity & Molecular Descriptors For Drug Design
Molecular Descriptors
for Drug Design
N. Sukumar
Center for Biotechnology & Interdisciplinary Studies
(x.4235; nagams@rpi.edu)
January 26, 2006
Structure-Activity Relationships
MOLECULAR
DESCRIPTOR
REPRESENTATION
Re atis
co tic
gn al
iti or
on P
M atte
et rn
ho
ds
Co
Ch mp
em uta
ist tio
ry n a
l
St
MOLECULAR
STRUCTURE
CHEMICAL/
BIOLOGICAL
ACTIVITY
Molecular Similarity
Similarity" can
have quite different
meanings in
chemical
approaches.
Molecular
Similarity does not
just mean similarity
of structural
features.
Similarity in a
chemical context
must include
additional
properties.
An example of Classification:
Macrocycles musky odor or not?
(C. Davidson and B. Lavine)
musk
non-musk
139 compounds:
103 musks
36 non-musks.
musk
non-musk
1 Macro Non-Musk
2 Macro Musk
11
11 11
1
11
1111
1111 1
111 1
111
111
1111
11111111 1
1
11111
1 Nitro Non-Musk
2 Nitro Musk
PC2
1
1111 11 1
11
-1
22
222
2
2
2
222222
2
22
2
2222
2
2
22
2
2
2
2
2
2
2
2
2222
22
222
22
-2
-3
-6
-4
-2
0
PC1
22
2 2 222
2 22 2
2222
222
222
2
222
22 2 222222
2
2 2 2 222222222
22
2 22 2222222
2 2
2
2
2 2
4
Multipleparameter
optimization
of lead
structures
Different barriers
Drugs
Mucus Gel Layer
Intestinal Epithelial Cells
Lamina Propria
Endothelium of Capillarics
Be absorbed
A series of
separate
barriers
(epithelial
layer is the
most dominant
barrier)
Motivation
Introduction of a new drug into the market is often the
culmination of a long and arduous process of laboratory
experimentation, lead compound discovery, animal testing and
pre-clinical and clinical trials.
This process, from hit to lead to marketable drug, is typically
as long as 10-15 years
In silico drug discovery:
find a correlation between molecular structure and biological activity
now any number of compounds, including those not yet synthesized,
can be virtually screened on the computer to select structures with the
desired properties.
Potency
Lead
Distribution
Drug
Excretion
Toxicity
Metabolism
Lead
Absorption
Distribution
Drug
Excretion
Toxicity
Metabolism
Solubility
Absorption
Mutagenicity
Bioavailability
Metabolic stability
Blood-brain barrier permeability
Cardiac toxicity (hERG)
Plasma protein binding
The figure depicts a cartoon representation of the relationship between the continuum of
chemical space (light blue) and the discrete areas of chemical space that are occupied
by compounds with specific affinity for biological molecules. Examples of such
molecules are those from major gene families (shown in brown, with specific gene
families colour-coded as proteases (purple), lipophilic GPCRs (blue) and kinases (red)).
The independent intersection of compounds with drug-like properties, that is those in a
region of chemical space defined by the possession of absorption, distribution,
metabolism and excretion properties consistent with orally administered drugs
ADME space is shown in green.
stopher Lipinski & Andrew Hopkins, NATURE|VOL 432 | 16 DECEMBER 2004, pp.855-861
H 3C
N
CH3
N
CH3
Molecular Representations
H 3C
N
CH3
N
CH3
Quantitative Structure-Activity
Relationships (QSAR)
QSAR was a natural extension of the LFER approach, with a
biological activity correlated against a series of parameters that
described the structure of a molecule.
The most well known and most used descriptor in QSAR has
been the LOG (Octanol/Water) partition coefficient (usually
referred to as LOG P or LOG P[o/w]). LOG P has been very
useful in correlating a wide range of activities due to its
excellent modeling of the transport across the blood/brain
barrier.
Unfortunately, many regressions do not work well for LOG P,
usually because other effects are important, such as steric and
electronic effects.
Therefore, many other descriptors have been used in QSAR in
addition to LOG P to incorporate these additional effects.
MOE Descriptors
= (RiRj)-1/2
is constructed from the row
sums Ri and Rj of the
adjacency matrix using the
algorithm (RiRj)-1/2 for the
contribution of each bond (i,j)
is a bond additive quantity
where terminal CC bonds are
given greater weight than
inner CC bonds.
Quantum chemical
Electron Density Derived
descriptors
The wave function given by solution of the Schrdinger
equation H = E contains all information about the molecule.
All science is either physics or stamp collecting
Ernest Rutherford (Nobel Prize in Chemistry, 1908)
BUT:
(r1, r2, r3, ) is a function of the coordinates of
all the electrons (and nuclei) in the molecule!
The fundamental laws necessary for the mathematical
treatment of a large part of physics and the whole of chemistry
are thus completely known, and the difficulty lies only in the
fact that application of these laws leads to equations that are
too complex to be solved. Paul Dirac (1902 - 1984)
Hohenberg-Kohn theorem
(Density Functional Theory)
The electron density (r)
(r) = *(r1, r2, r3, )(r1, r2, r3, )dr2dr3
contains all information about the ground state. (r) is a
function of only (x,y,z)
BUT:
the electron density (r) is an not a very
sensitive descriptor of chemistry ( near-sightedness of
the electron density)
Disadvantage: Difficult to use (r) directly as descriptor
Advantage:
Can use to simplify descriptor computations:
TAE-RECON method
EP ( r ) =
r R
K ( r ) = ( * + *)
2
G (r ) = * .
Electron Density Gradients
PIP ( r ) =
first term of EP
Fukui function
F+(r) = HOMO(r)
i( r)
(r )
Reconstruction Method
http://www.drugmining.com/
Topological Theory of
Atoms in Molecules
Definition of an Atom in a Molecule:
An atom is the union of an attractor and its basin
Each atom contains one (and only one) nucleus, which is
the attractor of its electron density distribution (r)
Every atom is bounded by an atomic surface of zero flux
. n = 0
Reconstruction Method
http://www.drugmining.com/
Wavelet Decomposition:
Creates a set of
coefficients that represent
a waveform.
Small coefficients may be
omitted to compress data.
1BLF (lactoferrin)
135L (lysozyme)
MLP2 surface
135L MLP2
135L EP
1BLF MLP2
1BLF EP
REENVYMAKLAEQAERYEEMVEFMEKVSNSLGSEELTVEERNLLSVAYKNVIGARRASWR
IISSIEQKEESRGNEEHVNSIREYRSKIENELSKICDGILKLLDAKLIPSAASGDSKVFY
LKMKGDYHRYLAEFKTGAERKEAAESTLTAYKAAQDIATTELAPTHPIRLGLALNFSVFY
YEILNSPDRACNLAKQAFDEAIAELDTLGEESYKDSTLIMQLLRDNLTLWTSDMQDDGAD
EIKE
linear sequence
2. Secondary
local, repetitive spatial
arrangements
3. Tertiary
3-D structure of native
fold
4. Quaternary
non-covalent
oligomerization of
subunits (single
polypeptides) into protein
complexes
Ramachandran
Map
In a polypeptide the main
chain N-C and C-C bonds
relatively are free to rotate.
These rotations are
represented by the torsion
angles and , respectively.
G. N. Ramachandran used
computer models of small
polypeptides to systematically
vary and with the
objective of finding stable
conformations.
Sims, Gregory E. et al. (2005) Proc. Natl. Acad. Sci. USA 102, 618-621
Copyright 2005 by the National Academy of Sciences
QSAR assumptions
The properties of a chemical are implicit in its
molecular structure What about effects of the environment?
Feature Selection.
Statistics?