Molecular Similarity

Molecular Similarity
and
Similarity Searching
Substructure and 3D pharmacophore searching involve
specification of a precise query. In this approach a molecule
in the database either matches the query or does not
match the query.

In similarity searching the query is typically the entire
molecule. A similarity coefficient is calculated and the top
scoring molecules from the database are the hits.

If two molecules are found to be similar they are thought
to have the same activity.

Similarities are computed based on a set of Molecular
Descriptor.

Molecular Descriptors
Apart from descriptors that can be calculated from
molecular formula or computed 3D structures,
experimental data can also be used.

But this is not usually feasible due to unavailability of
experimental data and the expenses involved.

Also some molecules mat not have been synthesised
yet.
1. Partition Coefficient
Most commonly, 1-Octonol Water partition
coefficient.
Descriptor is Log P , where P Partition Coefficient.
Various theoretical methods available for calculation
of P.
Most widely used are fragment based approaches, in
which partition coeff. are calculated as a sum of
individual fragment contributions plus correction
factors.
Widely used program is CLOGP (by Hansch and Leo),
employing the above approach.

CLOGP breaks a molecule into fragments by
identifying isolating carbons.

These are carbon atoms that are not doubly
bonded or triply bonded to a heteroatom.

The carbon atoms and attached hydrogens are
considered hydrophobic fragments, with
remaining froups of atoms being the polar
fragments.
Ex; Benzyl bromide & o- methyl acetanilide
One drawback of this method is the need for data for all the
fragments in the molecule.

An alternative is to use an atom-based approach in which the
molecule is broken down into atom types present. In such a
case the partition coeff. is given as a summation of contributions
from atoms of each type.

Where, ni is the number of atom type i and ai is the atomic
contribution.
2. Molar refractivity
MW is the molecular weight, d is the density and n is the refractive
index.

As molecular weight divided by density equals volume, MR gives
some indication of the steric bulk of a molecule.

Presence of refractive index term also provides a connection to
polarisibility of the molecule.

It can also be calculated using atomic values with some correction
factors
(Ex; CMR program by Leo and Weininger)
3. Topological indices
Kier and Hall have developed a large number of topological
indices, that characterizes the molecular structure as a single
number.
Every non-hydrogen atom in the molecule is characterised by
two delta values (the simple delta and the valence delta).

Where, i is the number of sigma electrons for atom i,
hi is the number of H atoms bonded to i,
ZiV is the number of valence electrons for atom i.
Thus this value can differentiate between CH3 from CH2.
CH3 has same simple delta value as NH2 but a difference
valence delta.
For elements beyond fluorine in the periodic table the valence delta
expression is modified as,

where, Zi is the atomic number.

The chi molecular connectivity indices are obtained by summing functions of
these delta values. Thus the chi index of zero order (about the atoms) is,

First order chi index involves summation over bonds,

Higher order chi indices involve summations over sequences 2, 3, etc.
bonds.

Kappa indices are shape indices of various orders, with first order shape
index involving a count over single-bond fragments, second oreder index
involving a count of two-bond paths, and so on.

Considering the first order, the extreme shapes are the linear molecule and
the graph where every atom is connected to every other atom.

The first order index is given by ,

where,
1
P is the number bonds,
1
P
max
&
1
P
min
are maximum and minimum
number of bonds.

The second order shape index is given by,

Kappa-alpha indices are what actually gives info about identity of atoms.

Alpha value is measure of its size relative to some standard,

Alpha value is calculated for molecule by summing individual atomic alphas
and then incorporating into shape indices,

This index depends upon the intrinsic state of an atom, which for an atom is
given by,

The sum of I and I gives the state of each atom.

These atomic topological states can be combined into a
whole-molecule descriptor by calculating the mean
square value for the atoms.

A finite number of I values are possible and so a bitstring
representation can be obtained by setting the
appropriate bit for different I values.
4. Pharmacophore Key
During conformational analysis, the pharmacophore keys within each
conformation are identified.

All possible combinations of three features within each acceptable

conformation are identified together with the distance between them.

Each distance is assigned to a distance bin to get a pharmacophore bit string.

Thus the key codes all possible 3-point pharmacophores that the molecule
could express.

These can be used like other binary descriptors.

4- point pharmacophores can also be used but the number of bits is high and
hence it is not practical to store them as simple bit strings.
H-bond donor, aromatic ring centroid and basic nitrogen

Molecular Similarity

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Molecular Similarity

Uploaded by

Copyright:

Available Formats

Molecular Similarity

You might also like