Professional Documents
Culture Documents
Bioinformatics Techniques For Drug Discovery
Bioinformatics Techniques For Drug Discovery
Bioinformatics Techniques For Drug Discovery
Bioinformatics
Techniques for
Drug Discovery
Applications for
Complex Diseases
SpringerBriefs in Computer Science
Series editors
Stan Zdonik, Brown University, Providence, Rhode Island, USA
Shashi Shekhar, University of Minnesota, Minneapolis, Minnesota, USA
Xindong Wu, University of Vermont, Burlington, Vermont, USA
Lakhmi C. Jain, University of South Australia, Adelaide, South Australia, Australia
David Padua, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
Xuemin Sherman Shen, University of Waterloo, Waterloo, Ontario, Canada
Borko Furht, Florida Atlantic University, Boca Raton, Florida, USA
V. S. Subrahmanian, University of Maryland, College Park, Maryland, USA
Martial Hebert, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
Katsushi Ikeuchi, University of Tokyo, Tokyo, Japan
Bruno Siciliano, Università di Napoli Federico II, Napoli, Italy
Sushil Jajodia, George Mason University, Fairfax, Virginia, USA
Newton Lee, Newton Lee Laboratories, LLC, Burbank, California, USA
SpringerBriefs present concise summaries of cutting-edge research and practical
applications across a wide spectrum of fields. Featuring compact volumes of 50 to
125 pages, the series covers a range of content from professional to academic.
Typical topics might include:
• A timely report of state-of-the art analytical techniques
• A bridge between new research results, as published in journal articles, and a
contextual literature review
• A snapshot of a hot or emerging topic
• An in-depth case study or clinical example
• A presentation of core concepts that students must understand in order to make
independent contributions
Briefs allow authors to present their ideas and readers to absorb them with
minimal time investment. Briefs will be published as part of Springer’s eBook
collection, with millions of users worldwide. In addition, Briefs will be available for
individual print and electronic purchase. Briefs are characterized by fast, global
electronic dissemination, standard publishing contracts, easy-to-use manuscript
preparation and formatting guidelines, and expedited production schedules. We aim
for publication 8–12 weeks after acceptance. Both solicited and unsolicited
manuscripts are considered for publication in this series.
Shakti Sahi
Bioinformatics Techniques
for Drug Discovery
Applications for Complex Diseases
123
Aman Chandra Kaushik Ravi Chaudhary
School of life Sciences School of Biotechnology
and Biotechnology Gautam Buddha University
Shanghai Jiao Tong University Greater Noida, Uttar Pradesh
Shanghai India
China
Shakti Sahi
Ajay Kumar School of Biotechnology
School of Engineering Gautam Buddha University
Gautam Buddha University Greater Noida, Uttar Pradesh
Greater Noida, Uttar Pradesh India
India
Shiv Bharadwaj
Nanotechnology Research
and Application Center
Sabanci University
Tuzla, Istanbul
Turkey
This Springer imprint is published by the registered company Springer International Publishing AG
part of Springer Nature
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
v
vi Preface
will be impressed with the fact that the fundamental strategies in drug discovery are
the inhibition of target by blocking their active sites present in any complex dis-
eases. This is to be expected since the evolutionary diversification and complexa-
tion taken place in different diseases are much greater than that of agents or
molecules metabolic activities or biochemical pathways.
Chapter 2 gives insight into the ligand-based approach for drug designing using
the computational technique of the subject. Chapter 3 describes the structure-based
approach for drug designing using computational technique and Chap. 4 integrates
the information on three-dimensional (3D) pharmacophore modelling based drug
designing by computational technique and other properties. Chapter 5 explains the
molecular dynamics simulation approach to investigate dynamic behaviour of
system through the application of Newtonian mechanics. Chapter 6 explains the
receptor thermodynamics of ligand–receptor or ligand–enzyme association and
Chap. 7 speaks about the thermodynamics cycles and their application in protein
targets. Finally, Chap. 8 provides the insights into different computational
approaches to understand the genomics and proteomics that help to predict the
target of interest.
1 Brief Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Brief Evolutionary History of In Silico Approaches . . . . . . . . . . 2
1.2 Computational Drug Discovery and Design . . . . . . . . . . . . . . . . 3
1.3 Epigenetics: Beyond the Sequence . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Histones Modification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2 Ligand-Based Approach for In-silico Drug Designing . . . . . . . . . . . . 11
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Molecular Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.1 2D QSAR Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.2 3D QSAR Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.3 Multidimensional QSAR . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3 Constitutional Descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4 Quantitative Structure–Activity Relationships . . . . . . . . . . . . . . . 14
2.5 Molecular Fingerprint and Similarity Searches . . . . . . . . . . . . . . 15
2.6 Similarity Searches in LB-CADD . . . . . . . . . . . . . . . . . . . . . . . 16
2.7 Similarity Networks and off Target Predictions . . . . . . . . . . . . . . 16
2.8 Fingerprint Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.9 Computational Methods for Biomolecular Docking . . . . . . . . . . . 17
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3 Structure-Based Approach for In-silico Drug Designing . . . . . . . . . . 21
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2 Protein Docking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2.1 Protein–Protein Docking . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2.2 Protein–Ligand Docking . . . . . . . . . . . . . . . . . . . . . . . . . 23
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
vii
viii Contents
ix
x About the Authors
provides the way to find the shortcuts or manage the guidelines towards the drug
designing and its commercialization [1]. In medical science, drug development is a
comprehensive study of different types of interactions between the chemical com-
pounds and macromolecules such as medicinal agents, also known as ligands and
their respective targets. The exploration for drug-like compounds or molecules that
specifically and selectively bind to the target, i.e. active sites in the biomolecules
of interest, followed by interference with its receptor function or enzymatic activity,
demands multi- and interdisciplinary approaches. Herein, computer-aided modelling
tools played an important role to predict and understand the relevant ligand–receptor
or ligand–enzyme interactions [2].
Fig. 1.1 Contributions of three main factors for ligand binding to total binding free energy
Fig. 1.2 Two types of basic interaction approaches between the drug and biological systems, termed
as PD events (activity and toxicity) and PK events (ADME) (modified from [19] and reproduced
with the kind permission of the Verlag Helvetica Chimica Acta in Zurich). ADME; absorption,
distribution, metabolism and excretion; PD, pharmacodynamic; PK, pharmacokinetic
mutualism [20]. Absorption, circulation and eradication will demonstrably show the
decisive impact on determination, i.e. definitive intensity and level of PD, and conse-
quently, biotransformation develops distinct PK. More precisely, it may be possibly
beneficial to pacify the goals as several biological elements that generate PD events
following their interaction with drug molecule or any other xenobiotic compound.
Such elements include receptors, ion networks, nucleic acids, anabolic and catabolic
enzymes. Likewise, one could relate to biological components that include biological
xenobiotic metabolizing enzymes, transporters, circulating proteins, membranes, as
such they act on drugs by metabolizing, transporting, distributing or excreting out of
the biological system.
Drug designing and development of new medicines is a long, multifaceted, expen-
sive and highly perilous procedure that has few peers in the commercial world. There-
fore, computer-aided drug design (CADD) approaches are being widely employed
in pharmaceutical industries to rapidly speed up the drug development process [21].
Typically, it takes 10–15 years and approximately US$500–800 million for the syn-
thesis and testing of lead drugs into the market [22]. In this regard, it is advantageous
to use computational aided tools in the optimization of hit-to-lead drug to cover a
large library of chemicals whilst decreasing the number of compounds that should be
deigned as evaluate in the in vitro studies. The standardization of potential screened
ligand by computational aided tools involves structure-based analysis of docking
energy profiles for the screened analogs, ligand-based evaluation of screened com-
pounds with analogous chemical structure, enhanced projected biological activity,
calculation of favourable affinity, improve drug metabolism and pharmacokinetics
6 1 Brief Introduction
the overall genomic stability. Beyond the examples outlined above, CpG methylation
in mammals has been investigated in most of the genes, particularly in context to
cancer where aberrant methylation is linked to inappropriate activation or repression
of cell proliferation-related genes. Typically, promoter regions can be alienated into
two types of categories based on the presence or absence of CpG islands. Genes in
which their respective promoter sites contain CpG islands and more commonly in
an unmethylated state are generally repressed via means other than DNA methyla-
tion, such as by binding of polycomb proteins. However, methylation of CpG island
promoters is seen in the regions where a long-term interval for repressed state is
required, such as in female X chromosome inactivation and imprinted genes. Inter-
estingly, genes whose promoter region do not contain CpG islands show much more
variability in their DNA methylation [32].
CpG sites within the frames of genes are also subject to variable DNA methy-
lation. Exceptionally, this DNA methylation is typically positively correlated with
expression of a gene when present within its frames rather than near the transcription
initiation site. Current hypothesis points towards hindrance of gene methylation at
spurious transcription initiation sites within the gene frames that allows transcription
machinery to more effectively bind and initiate transcription at true start sites [33].
Enhancers are sites more distal (up to several hundred kb) from genes that also
participate in the process of transcriptional regulation. The functions and effects of
enhancer DNA methylation are less well researched than those for promoters. But
recent efforts have found active enhancers to be neither completely unmethylated
nor methylated, but to exist in states termed as ‘low-methylation’ regions [34].
Research in past decades focused on DNA methylation, its patterns and effects
at canonical genes, or in the context of diseases such as cancer. High throughput
methods for measuring DNA methylation at a wide range of CpG loci in the genome
has been used to extract information on quantifying distribution of DNA methyla-
tion and variation in populations of healthy individuals, as well as its relationship
to genetic variation, gene expression and other epigenetic traits. Also, recent work
done to investigate these relationships in corresponds to set of primary untransformed
human fibroblasts and documented the presence of both negative and positive corre-
lations between DNA methylation and gene expression that depend less on position
with respect to gene frame or promoter and more with respect to histone marks in
the selected gene region.
The genetic material, i.e. DNA in case of eukaryotic cells is well packaged
into nucleosomes that tends to reduce the access to DNA for the transcription
machinery. Further, additional modifications in the histones, i.e. the constituent
proteins of nucleosomes, could also either further restrict or alleviate the access
to DNA. Moreover, various amino acid residues within the histones are subject
to various modifications, including methylation, ubiquitination, acetylation and
1.4 Histones Modification 9
References
1. S. Ekins, J. Mestres, B. Testa, In silico pharmacology for drug discovery: methods for virtual
ligand screening and profiling. Br. J. Pharmacol. 152, 9–20 (2007)
2. S. Ekins, J. Mestres, B. Testa, In silico pharmacology for drug discovery: applications to targets
and beyond. Br. J. Pharmacol. 152, 21–37 (2007)
3. A. Albert, Relations between molecular structure and biological activity: stages in the evolution
of current concepts. Ann. Rev. Pharmacol. 11:13–36 (1971)
4. A. Albert, Selective toxicity. The physcico-chemical basis of therapy. Chapman and Hall:
London (1985)
5. H. Meyer, Zur Theorie der Alkoholnarkose. Arch. Expl. Patholharmakol. 42:110–118 (1899)
6. E. Overton, Studien über die Narkose. Gustav Fischer: Jena (1901)
7. C. Hansch, T. Fujita, p-σ-π analysis. A method for the correlation of biological activity and
chemical structure. J. Am. Chem. Soc. 86, 1616–1626 (1964)
8. C. Hansch, Quantitative relationships between lipophilic character and drug metabolism. Drug
Metab. Rev. 1, 1–13 (1972)
9. A. Cushny, Biological Relations of Optical Isomeric Substances. Williams and Wilkins: Bal-
timore (1926)
10. A. Burgen, Conformational changes and drug action. Fed Proc, 2723–2728 (1981)
11. E.J. Arïens EJ. Receptors: from fiction to fact. Trends Pharmacol. Sci. 1:11–15 (1979)
12. J. Parascandola, Origins of the receptor theory. Trends Pharmacol. Sci. 1, 189–192 (1979)
13. X. Du, Y. Li, Y.-L. Xia, S.-M. Ai, J. Liang, P. Sang, X.-L. Ji, S.-Q. Liu, Insights into protein–li-
gand interactions: mechanisms, models, and methods. Int. J. Mol. Sci. 17, 144 (2016)
14. P. Csermely, R. Palotai, R. Nussinov, Induced fit, conformational selection and independent
dynamic segments: an extended view of binding events. Trends Biochem. Sci. 35, 539–546
(2010)
15. S. Ekins, P.W. Swaan, Development of computational models for enzymes, transporters, chan-
nels, and receptors relevant to ADME/Tox. Rev. Comput. Chem. 20, 333 (2004)
16. P.A. Whittaker, What is the relevance of bioinformatics to pharmacology? Trends Pharmacol.
Sci. 24, 434–439 (2003)
10 1 Brief Introduction
2.1 Introduction
used for in silico screening of novel ligands holding the desired biological activity,
hit-to-lead and lead-to drug optimization. Also, this approach can be employed in
the optimization to improve drug metabolism and pharmacokinetics (DMPK) or
potential toxicity (ADMET) properties.
The wide category of descriptors found in this approach, i.e. 2D-QSAR, is defined as
the typical characteristic to separate through the 3D orientation of ingredients. These
descriptors cover anything from easy measures of entities constituting the molecule,
via its topological and geometrical characteristics to calculate the electrostatic and
quantum-chemical descriptors or higher level methods such as fragment counting [7].
A general workflow for QSAR-based drug discovery project involves the collec-
tion of active and inactive ligands group followed by designing a set of mathematical
descriptors that describe the physicochemical and structural properties of selected
ligands or compounds. Following a model is generated to identify the relationship
between those descriptors and respective experimental activity, increasing the pre-
dictive probability. Finally, the model is employed to predict the activity for a library
of test compounds that were encoded with the same descriptors. Hence, the accom-
plishment of designed QSAR model relies not only on the quality of initial set of
active/inactive ligands but also depends on the selected descriptors as well as the
ability to establish an appropriate mathematical equation. However, one of the most
relevant facts regarding this method is that all the designed models will be directly
proportional to the sampling space of initial set of ligands or compounds with known
activity and on the chemical diversity. In brief, divergent scaffolds or functional
groups of the ligands are not considered within this ‘training’ group of compounds
and will not be signified in the final designed model. Whilst, any potential hits within
the screened library that contain these groups will likely be unexploited. Hence, it
is recommended to select a wide chemical space within the training set. In fact,
modern REACH plan of European Union has encouraged the experts and regula-
tors to concentrate on developing specific validation concepts for QSAR models in
the framework of chemical-based legislations, formerly known as the Setubal, and
nowadays called as OECD concepts.
Fingeprint methods may be used to search the databases for ingredients which are
close to structure query and promoting a lengthy selection of ingredients that tend to
be examined for increased task through the contribute. While, 2D similarity search
databases utilize the chemo-type information from earliest generation hits, resulting
testing are used in 2D fingerprint and 3D shape similarity searches to determine
unique agonists. The hormone oestrogen is an essential hormone which is liable for
most of the elements in developmental physiology of structure [22]. Cytohesins rep-
resents the little guanine nucleotides change aspects that promote Ras-like GTPases
and control the various regulatory networks concerned in a type of disease [23].
Recently, chemical likeness measures like Tanimoto coefficients are now being uti-
lized to generate the networks capable of clustering drugs that bind to numerous
objectives to novel off aim effects. Keiser et al. [24] utilized a similarity approach
that was ensemble as SEA to compare the drug targets based on their ligands simi-
larity. SEA predicts whether a ligand and target will interact utilizing an analytical
model for chemical similarity based on possibility. Sets of ligands that communicate
with every target are distinguished by determining Tanimoto coefficients according
to standard 2D daylight fingerprints for every single set of molecules between two
sets [25]. Natural similarity ratings between all the pairs of ligand sets are determined
as the amount of all Tanimoto coefficients involving the sets higher than 0.57. Since,
the possibility of attaining Tanimoto coefficients higher than 0.57 increases with
set size, this is certainly normalized by expected similarity. This model for random
chemical similarity is accomplished by arbitrarily creating 300,000 pairs of molecule
sets with spanning logarithmic size of 10–1000 molecules. Expectation ratings are
predicated based on nature ratings by random possibility and utilizing the sequential
connect to the ligand sets on the clustered map [25].
ically practical matching of functional set. This produces a topological map of the
most enormously similar pair of structurally diverse molecules or fragments along
with the active molecules. Whilst, conserved features of high similarity are rated
according to the matching nodes due to low dependence on chemical substructures.
However, the MTree model is a particular concept and employed to recognize the
alternative novel molecular scaffolds or chemo types [27].
With the rapidly increasing quantity of generated molecular data, the computer-based
evaluation of molecular interactions becomes progressively feasible. Techniques for
computer-aided molecular docking incorporate a sensibly precise style of energy and
the ability to cope with the combinatorial complexity sustained by molecular ver-
satility for the docking partners. In both the fields, in the last few years, significant
development has been observed. Interactions between biomolecules are the founda-
tion to any or all the biological procedures. Using these interactions, living organisms
preserve complex regulating and metabolic interaction networks that together con-
stitute the processes of life. Evaluation of experimental work and computer system
simulations are the primary scientific tools to find the molecules that can be used
as bioactive substances to change and manage the processes of life. The calcula-
tion of specific molecular interactions appears approximately in the chain, i.e. ana-
lytic to understand the life’s procedures. On the one hand, assessment of molecular
interactions needs very least and ideally considerable levels of familiarity with the
three-dimensional frameworks. Understanding the specific molecular interactions
had been purposed as the requirement to develop a global model for the biological
process inside an organism. For better understanding of biomolecular structures at a
volatile rate and computer system models for biomolecular docking can consequently
be regulated up to rapidly developed data sets. In inclusion, new algorithms are now
being designed that are focused on the target of considerable combinatorial com-
plexities with conformational spaces docking problems as well as modelling energy.
Methods of docking can often aim in a very precise and step-by-step evaluation of a
solitary example of rating various molecular buildings agonists [28].
References
1. M.A. Johnson, G.M. Maggiora, Concepts and Applications of Molecular Similarity (Wiley,
USA, 1990)
2. J. Mestres, L. Martín-Couce, E. Gregori-Puigjané, M. Cases, S. Boyer, Ligand-based approach
to in silico pharmacology: nuclear receptor profiling. J. Chem. Inf. Model. 46, 2725–2736
(2006)
18 2 Ligand-Based Approach for In-silico Drug Designing
3. R.D. Cramer, D.E. Patterson, J.D. Bunce, Comparative molecular field analysis (CoMFA). 1.
Effect of shape on binding of steroids to carrier proteins. J. Am. Chem. Soc. 110, 5959–5967
(1988)
4. C. Acharya, A. Coop, J.E. Polli, A.D. MacKerell, Recent advances in ligand-based drug design:
relevance and utility of the conformationally sampled pharmacophore approach. Curr. Comput.
Aided Drug Des. 7, 10–22 (2011)
5. Y. Marrero-Ponce, O.M. Santiago, Y.M. López, S.J. Barigye, F. Torrens, Derivatives in discrete
mathematics: a novel graph-theoretical invariant for generating new 2/3D molecular descrip-
tors. I. Theory and QSPR application. J. Comput. Aided Mol. Des. 26, 1229–1246 (2012)
6. R. Todeschini, V. Consonni, Handbook of Molecular Descriptors (Wiley, USA, 2008)
7. Q. Du, P.G. Mezey, K.C. Chou, Heuristic molecular lipophilicity potential (HMLP): a 2D-
QSAR study to LADH of molecular family pyrazole and derivatives. J. Comput. Chem. 26,
461–470 (2005)
8. H. Kubinyi, 3D QSAR in Drug Design: Volume 1: Theory Methods and Applications (Springer
Science & Business Media, Germany, 1993)
9. V. Consonni, R. Todeschini, M. Pavan, P. Gramatica, Structure/response correlations and sim-
ilarity/diversity analysis by GETAWAY descriptors. 2. Application of the novel 3D molecular
descriptors to QSAR/QSPR studies. J. Chem. Inf. Comput. Sci. 42, 693–705 (2002)
10. A. Vedani, M. Dobler, Multidimensional QSAR: moving from three-to five-dimensional con-
cepts. Mol. Inform. 21, 382–390 (2002)
11. A. Hopfinger, S. Wang, J.S. Tokarski, B. Jin, M. Albuquerque, P.J. Madhav, C. Duraiswami,
Construction of 3D-QSAR models using the 4D-QSAR analysis formalism. J. Am. Chem. Soc.
119, 10509–10524 (1997)
12. A. Vedani, M. Dobler, 5D-QSAR: the key for simulating induced fit? J. Med. Chem. 45,
2139–2149 (2002)
13. S. Gosav, M. Praisler, D. Dorohoi, G. Popa, Structure–activity correlations for illicit
amphetamines using ANN and constitutional descriptors. Talanta 70, 922–928 (2006)
14. Y. Zhang, I-TASSER server for protein 3D structure prediction. BMC Bioinform. 9, 40 (2008)
15. C. Hansch, T. Fujita, p-σ-π analysis. A method for the correlation of biological activity and
chemical structure. J. Am. Chem. Soc. 86, 1616–1626 (1964)
16. S.M. Free, J.W. Wilson, A mathematical contribution to structure-activity studies. J. Med.
Chem. 7, 395–399 (1964)
17. J. Polanski, A. Bak, R. Gieleciak, T. Magdziarz, Modeling robust QSAR. J. Chem. Inf. Model.
46, 2310–2318 (2006)
18. C. Hansch, D. Hoekman, A. Leo, D. Weininger, C.D. Selassie, Chem-bioinformatics: com-
parative QSAR at the interface between chemistry and biology. Chem. Rev. 102, 783–812
(2002)
19. A. Kurup, C-QSAR: a database of 18,000 QSARs and associated biological and physical data.
J. Comput. Aided Mol. Des. 17, 187–196 (2003)
20. J. Auer, J. Bajorath, Molecular similarity concepts and search calculations, in Bioinformatics:
Structure, Function and Applications (2008), pp. 327–347
21. P. Willett, Similarity-based virtual screening using 2D fingerprints. Drug Discov. Today 11,
1046–1053 (2006)
22. G.R. Sliwoski, 3D Enantioselective Descriptors for Ligand-based Computer-aided Drug
Design (Vanderbilt University, USA, 2012)
23. D. Stumpfe, A. Bill, N. Novak, G. Loch, H. Blockus, H. Geppert, T. Becker, A. Schmitz,
M. Hoch, W. Kolanus, Targeting multifunctional proteins by virtual screening: structurally
diverse cytohesin inhibitors with differentiated biological functions. ACS Chem. Biol. 5,
839–849 (2010)
24. M.J. Keiser, V. Setola, J.J. Irwin, C. Laggner, A.I. Abbas, S.J. Hufeisen, N.H. Jensen, M.B.
Kuijer, R.C. Matos, T.B. Tran, R. Whaley, R.A. Glennon, J. Hert, K.L.H. Thomas, D.D.
Edwards, B.K. Shoichet, B.L. Roth, Predicting new molecular targets for known drugs. Nature
462(7270), 175–181 (2009)
References 19
25. M.J. Keiser, B.L. Roth, B.N. Armbruster, P. Ernsberger, J.J. Irwin, B.K. Shoichet, Relating
protein pharmacology by ligand chemistry. Nat. Biotechnol. 25, 197–206 (2007)
26. G. Hessler, M. Zimmermann, H. Matter, A. Evers, T. Naumann, T. Lengauer, M. Rarey,
Multiple-Ligand-Based Virtual Screening: Â Methods and Applications of the MTree
Approach. J. Med. Chem. 48(21), 6575–6584 (2005)
27. A. Evers, G. Hessler, H. Matter, T. Klabunde, Virtual screening of biogenic amine-binding
G-protein coupled receptors: comparative evaluation of protein-and ligand-based virtual
screening protocols. J. Med. Chem. 48, 5448–5465 (2005)
28. T. Lengauer, M. Rarey, Computational methods for biomolecular docking. Curr. Opin. Struct.
Biol. 6, 402–406 (1996)
Chapter 3
Structure-Based Approach for
In-silico Drug Designing
Abstract In recent years, research area of structure-based drug design is a rising field
that has been used to achieve many successes. Structure-based computer-aided drug
design (SB-CADD) depends on the ability to determine and analyse the 3D structures
of the target of interest. In other words, a prerequisite for the SB-CADD approach can
be defined based on molecule’s ability to interrelate with a specific ligand, that can be
a chemical species or biomolecule such as protein, and a desired biological activity
based on its ability to favourably interact at a binding site on the selected target. This
purposed that the molecules sharing those favourable interactions will reflect the
similar biological effects. Therefore, novel ligands can be predicted and concluded
by careful analysis of a protein’s binding site. Also, structure-based approach for
drug designing allows a rapid selection of potential ligands from different and large
compound libraries that can be later validated through modelling/simulation and
visualization techniques.
3.1 Introduction
With the advent of modern science, rational drug design based on the protein structure
was an unrealistic goal to attain as purposed by the structural biologists. However,
during the mid-80s, and by the early 1990s, the rational drug design was underway
in the first success stories that get published [1, 2]. However, in the present scenario,
although there is still quite a bit of fine-tuning necessary to predict and optimize the
process, structure-based drug design is an essential branch and popularly used in most
of the industrial drug discovery programs [4] as well as occupies many academic
laboratories as key topic of research [3].
Recent developments in the information technology have been employed on the
large amount of data generated to identify novel drug molecules and improve upon
the existing drugs [3]. Recently, high-throughput crystallography techniques, such as
automation at all stages, more intense synchrotron radiation, and new developments
in phase determination, have reduced the intervals to determine the structures. In this
regards, structure determination using nuclear magnetic resonance (NMR) has been
broadly employed in the last few years, in addition with magnet and probe improve-
ments, automated assignment [4, 5], and new experimental methods to elucidate the
larger structures [6]. Structure-based drug development is at most influential when
it contributes in an entire drug discovery process. It is also significant to contem-
plate that structure-based drug design guides the discovery of a drug lead, which
is not a drug product, however, precisely predicts a compound or lead with at least
micromolar affinity to the selected target [7].
Several protein–protein docking techniques are derived based on the ‘rigid body’ pre-
sumption. With the best abstraction, this extremely simplistic design considered the
two proteins as two rigid solid bodies. Geometric surface model and data structures
are utilized to find the reasonable binding modes and heuristic cost functions. For
the intended purpose of rapidly locating the contacting surfaces on the two proteins
within the rigid body method, accordingly simple and contented information on the
surface structures is highly required. A few research reports have centered on this
problem. Lin et al. [8] and Norel et al. [9] had actually supplied a simple worldwide
surface information by various techniques in the form of grid-based representation for
the necessary protein area [10, 11]. Walls and Sternberg [12] explained the necessary
protein area within a two-dimensional grid of geometric functional values produced
by forecasts of area on the airplanes. Also, Helmer-Citterich and Tramontano [13]
used the projection on a cylindrical area. A unique and interesting concept is to utilize
the spherical harmonics for explaining necessary protein areas at various quantities
of reliability [14]. To exactly do the match amongst two-point units representing the
areas of two docking partners, unique algorithms are required. Shoichet et al. [15]
used the DOCK that was well-known algorithm for this function. Another paradigm,
this is certainly and specifically helpful is the geometric that has been calculated
over from the field of computer system vision [16, 17]. Another method is to utilize
3.2 Protein Docking 23
the quick Fourier transform for the competent calculation of optimal computation
for translations, coupled with rotational sampling [10, 18, 19]. Duncan and Olson
[14] use an algorithm that evolutionary enhanced the geometric fit between the two
proteins. This could be done by utilizing the worldwide optimization techniques on
accordingly defined conformational spaces or molecular dynamics methods. Totrov
and Abagyan [20] present this kind of technique that can be in a position to repli-
cate the complex between lysozyme and an antibody through the coordinates of the
uncomplexed molecules. However, these optimization techniques are dependent on
the most complex types of energy reported within the literature and integrates Monte
Carlo methods.
For being able to level the processing time, conformational versatility of the
protein is specifically bound on appropriate flexible chain in the area of residues. The
proteins conformational flexibility is limited to relevant motions on the side chains
of surface residues to limit the amount of computing time. Nonetheless, a substantial
number of processing time are necessary for such optimizations. Acquiring very
precise outcomes critically depends on the type of energy that precisely makes up
about all the appropriate enthalpic and entropic efforts. Abagyan and Totrov [21] took
one step in this way by including terms for electrostatic and side chain entropies into
the energy estimation. Nonetheless generally, even more exploration needs to be
performed in this field [14].
Docking small, mainly organic molecules or proteins both are pertinent to com-
prehending biological procedure that can be helpful in the drug designing. In the
recent years, a big group difference has been created against testing the ligands
database and precisely examined the specific molecular communication. These
databases are available to the researchers to investigate and conduct the most spe-
cific docking experiments. Wherein, complementary contact areas amongst the lig-
and and the receptor are much less discriminating in comparison to full instance of
protein–protein docking studies. In fact, these tiny ligands tend to be very versatile,
that means they could be employed on the area to check the receptor pocket. Conse-
quently, in protein–ligand docking, the prime challenge is to deal with the modelling
of ligand flexibleness accurately to understand the weak interactions between ligand
and the receptor. Progress along these outlines has been provided by several research
groups in the past few years. Miller et al. [22] and Klebe and Mietzner [23] have
developed different ways to design simple conformational units that can be used for
rigid docking method. Whereas, unique energy feature already has been created that
includes essential efforts for the docking [24].
New combinatorial algorithm has been purposed to directly tackle the difficulty of
ligand versatility, initiating the quickest method availability for versatile ligand dock-
ing. Evolutionary algorithm has been used to solve the flexible ligand docking strain.
Also, structural versatility, there are more important phenomena that are important
24 3 Structure-Based Approach for In-silico Drug Designing
References
1. N.A. Roberts, J.A. Martin, D. Kinchington, A.V. Broadhurst, J.C. Craig, I.B. Duncan,
S.A. Galpin, B.K. Handa, J. Kay, A. Krohn, Rational design of peptide-based HIV proteinase
inhibitors. Science 248, 358–361 (1990)
2. J. Erickson, D.J. Neidhart, J. VanDrie, D.J. Kempf, D.A. Paul, Design, activity, and 2.8
(angstrom) crystal structure of a C (2) symmetric inhibitor complexed to HIV-1 protease.
Science 249, 527 (1990)
3. A.C. Anderson, The process of structure-based drug design. Chem. Biol. 10, 787–797 (2003)
4. D. Zheng, Y.J. Huang, H.N. Moseley, R. Xiao, J. Aramini, G. Swapna, G.T. Montelione,
Automated protein fold determination using a minimal NMR constraint strategy. Protein Sci.
12, 1232–1246 (2003)
5. N. Oezguen, L. Adamian, Y. Xu, K. Rajarathnam, W. Braun, Automated assignment and 3D
structure calculations using combinations of 2D homonuclear and 3D heteronuclear NOESY
spectra. J. Biomol. NMR 22, 249–263 (2002)
6. K. Pervushin, R. Riek, G. Wider, K. Wüthrich, Attenuated T2 relaxation by mutual cancellation
of dipole–dipole coupling and chemical shift anisotropy indicates an avenue to NMR structures
of very large biological macromolecules in solution. Proc. Natl. Acad. Sci. 94, 12366–12371
(1997)
7. C.L. Verlinde, W.G. Hol, Structure-based drug design: progress, results and challenges. Struc-
ture 2, 577–587 (1994)
8. S.L. Lin, R. Nussinov, D. Fischer, H.J. Wolfson, Molecular surface representations by sparse
critical points. Proteins: Struct. Funct. Bioinfor. 18, 94–101 (1994)
9. R. Norel, S.L. Lin, H.J. Wolfson, R. Nussinov, Molecular surface complementarity at protein-
protein interfaces: the critical role played by surface normals at well placed, sparse, points in
docking. J. Mol. Biol. 252, 263–273 (1995)
10. I.A. Vakser, C. Aflalo, Hydrophobic docking: a proposed enhancement to molecular recognition
techniques. Proteins: Struct. Funct. Bioinform. 20, 320–329 (1994)
11. F. Ackermann, G. Herrmann, F. Kummert, S. Posch, G. Sagerer, D. Schomburg, Protein dock-
ing: combining symbolic descriptions of molecular surfaces and grid-based scoring functions,
in ISMB (1995), pp. 3–11
12. P.H. Walls, M.J. Sternberg, New algorithm to model protein-protein recognition based on
surface complementarity: Applications to antibody-antigen docking. J. Mol. Biol. 228, 277–297
(1992)
13. M. Helmer-Citterich, A. Tramontano, PUZZLE: a new method for automated protein docking
based on surface shape complementarity. J. Mol. Biol. 235, 1021–1031 (1994)
14. T. Lengauer, M. Rarey, Computational methods for biomolecular docking. Curr. Opin. Struct.
Biol. 6, 402–406 (1996)
15. B.K. Shoichet, I.D. Kuntz, D.L. Bodian, Molecular docking using shape descriptors. J. Comput.
Chem. 13, 380–397 (1992)
16. D. Fischer, S.L. Lin, H.L. Wolfson, R. Nussinov, A geometry-based suite of molecular docking
processes. J. Mol. Biol. 248, 459–477 (1995)
References 25
17. H.-P. Lenhof, An Algorithm for the Protein Docking Problem (1995)
18. E. Katchalski-Katzir, I. Shariv, M. Eisenstein, A.A. Friesem, C. Aflalo, I.A. Vakser, Molecu-
lar surface recognition: determination of geometric fit between proteins and their ligands by
correlation techniques. Proc. Natl. Acad. Sci. 89, 2195–2199 (1992)
19. I.A. Vakser, Protein docking for low-resolution structures. Protein Eng. Des. Sel. 8, 371–378
(1995)
20. M. Totrov, R. Abagyan, Detailed ab initio prediction of lysozyme–antibody complex with 1.6
Å accuracy. Nat. Struct. Mol. Biol. 1, 259–263 (1994)
21. R. Abagyan, M. Totrov, Biased probability Monte Carlo conformational searches and electro-
static calculations for peptides and proteins. J. Mol. Biol. 235, 983–1002 (1994)
22. M.D. Miller, S.K. Kearsley, D.J. Underwood, R.P. Sheridan, FLOG: a system to select ‘quasi-
flexible’ligands complementary to a receptor of known three-dimensional structure. J. Comput.
Aided Mol. Des. 8, 153–174 (1994)
23. G. Klebe, T. Mietzner, A fast and efficient method to generate biologically relevant conforma-
tions. J. Comput. Aided Mol. Des. 8, 583–606 (1994)
24. H.-J. Böhm, The development of a simple empirical scoring function to estimate the binding
constant for a protein-ligand complex of known three-dimensional structure. J. Comput. Aided
Mol. Des. 8, 243–256 (1994)
25. C. Poornima, P. Dean, Hydration in drug design. 1. Multiple hydrogen-bonding features of water
molecules in mediating protein-ligand interactions. J. Comput. Aided Mol. Des. 9, 500–512
(1995)
26. A. Wlodawer, Rational drug design: the proteinase inhibitors, Pharmacotherapy. J. Human
Pharmacol. Drug Ther. 14 (1994)
Chapter 4
Three-Dimensional (3D) Pharmacophore
Modelling-Based Drug Designing by
Computational Technique
4.1 Introduction
With the aid of pharmacophore modelling, a simple technique that produces results
that would be intuitive to an experienced medicinal chemist, this approach inflex-
ibly models the different interactions that could possibly be produced between a
ligand and its binding site in a specific binding situation at the target of interest [1].
This produced chemical features results in three-dimensional (3D) spatial arrange-
ment using algorithms that further derive information based on the standard rules
on chemical features. These designed models, also known as 3D pharmacophores,
can be employed to search the similarities between binding situations or even for
similarities between different molecules [1]. This standardized the pharmacophore
modelling into its advantages and disadvantages; (i) the rule-based deigning of chem-
ical features based on an ideal interface between medicinal chemistry and computer
science, provides the means to add intentional and necessary bias to the medicinal
or computational chemist for still imperfect representation of molecules in the com-
puters, (ii) heuristic modelling is not a systematic approach: important interactions
may not be well represented in a specific chemical feature model, increasing the
A pharmacophore model of the target-binding site summarizes steric and digital fea-
tures required for the ideal interaction of the ligand with the target of interest. Most
frequent pharmacophores that have been established are hydrogen bond acceptors,
hydrogen bond donors, fundamental groups acid groups, limited charge, aliphatic
hydrophobic moieties and aromatic moieties. Pharmacophore functions now have
been utilized in drug discovery for digital evaluation, de novo design and lead opti-
mization [7]. A pharmacophore model of the prospective target binding site can be
employed partially to use for screening a putative hit from a collection of substance.
Aside from querying information based on energetic substances, pharmacophore
model can additionally be used by de novo design algorithms to guide the synthe-
sis of new substances. Structure-based pharmacophore techniques are dependent
on the evaluation of site based on a target–ligand complex structure. Ligand Scout
[8] used the protein–ligand data that was complex map interactions between ligand
and target. An understanding-based guideline set acquired through the PDB can be
used to instantly identify and classify the relations into hydrogen bond interactions,
charge transfers and lipophilic areas [8]. The Pocket v.2 algorithm [9] can perform
and instantly develop a pharmacophore model from the target–ligand complex. The
algorithm produces frequently spaced grids across the ligand and the residues. Probe
atoms that represent a hydrogen bond donor, a hydrogen bond acceptor as well as
a hydrophobic group are utilized to scan the grids. An empirical scoring function,
SCORE, can be used to explain the binding constant between probe atoms and the
target.
Wei et al. (2008) utilized Pocket v.2 to spot typical pharmacophore for the two
targets taking part in inflammatory signalling; human being leukotriene A4 hydro-
lase (LTA4H-h) and non-pancreatic secretory phospholipase A2 (PLA2). The
co-crystal structure PDB code 1HS6 of LTA4H-h with 2-(3amino-2-hydroxy-4-
phenylbutyrylamino)-4-methylpentanoic acid (bestatin) and the structure (PDB code
1DB4) of PLA2 with [3-(1-benzyl-3-carbamoylmethyl-2methyl-1H-indol-5-yloxy)
propyl] phosphonic acid (indole 8) were utilized to derive the two goals of phar-
macophores. For LTA4H-h, six pharmacophore facilities had been identified that
included four hydrophobic, one hydrogen bond acceptor, and zinc metal coordina-
tion pharmacophore. Within the pocket that is binding of three hydrophobic centres,
one hydrogen bond acceptor and calcium ion control centres had been identified
[11]. The contrast of two units of pharmacophore models disclosed that two pharma-
cophores are hydrophobic; a pharmacophore that coordinated with the material and
ended up being typical of both the proteins. The authors purposed that substances ful-
fil the requirement of typical pharmacophores that would prevent both the proteins.
The MDL substance information base had been screened practically with LTA4H-h
and PLA2 utilizing Dock4.0 and binding conformation of the top 150,000 substances
(60% of database ranked by Dock rating) was extracted and examined for confor-
mity to typical pharmacophores. The most useful inhibitor, substance 10, inhibited
LTA4H-h at submicromolar range and PLA2 having an IC50 value of 7.3 mM.
References
5.1 Introduction
The term molecular mechanics (MM) refers to the use of simple potential energy
functions (e.g. harmonic oscillator or Coulombic potentials) to model molecular
systems. Molecular mechanics approaches are widely applied in molecular structure
refinement, molecular dynamics (MD) simulations, Monte Carlo (MC) simulations
and ligand-docking simulations [1].
Dynamic simulation methods are widely used to obtain the information on the
time evolution of conformations of proteins and other biological macromolecules
[2, 3] and also kinetic and thermodynamic information [1]. Simulations can provide
fine details concerning the motions of individual particles as a function of time.
They can be utilized to quantify the properties of a system with precision and on a
timescale that is otherwise inaccessible, and simulation is, therefore, a valuable tool
in extending our understanding of model systems. Theoretical consideration of a
system additionally allows one to investigate the specific contributions to a property
through ‘computational alchemy’, that is, by modifying the simulation in such a way
that it is nonphysical but nonetheless allows a model’s characteristics to be probed.
One example is the artificial conversion of energy function from one system to that of
another during a simulation. This is an important technique in free energy calculations
[4]. Thus, molecular dynamics simulations, along with a range of complementary
computational approaches, have become valuable tools for investigating the basis of
protein structure and function.
inhibitor, raltegravir. This MD technique has also been used in various other pro-
motions to recognize inhibitors of the target of great interest [9]. Metadynamics is
really a MD-based way of predicting and ligand binding, i.e. scoring. This tech-
nique maps the entire free energy landscape to energy which is a free accelerated
method as it monitors the reputation for currently sampled areas. Throughout the
MD simulation of protein–ligand complex, a Gaussian repulsive potential is added
on explored regions, steering the simulation towards new free energy regions of a pro-
tein–ligand complex. Millisecond timescale MD simulations are now feasible with
special-purpose devices like Anton. Such lengthy simulations permit the research of
medication binding events to their necessary target protein. Anton has been utilized
effectively for complete resolution through atomic folding. Improvements in comput-
ing device abilities suggested that the necessary protein versatility can be routinely
accessed more on longer timescales. This will provide the extended information on
conformational versatility.
binding pocket [10]. QXP optimizes grid map energy and internal ligand energy for
searching the ligand–target structure. The algorithm carries out a rigid-body position-
ing of ligand–target complex followed closely by MCM interpretation and rotation
of ligand. This task is closely followed by another rigid-body positioning body that
is rigid, and rating utilized the energy grid map. The general opportunities of ligand
and target molecule compensate the inner factors associated with the strategy. Inter-
nal factors are susceptible to random modification used by neighbourhood energy
minimization and choice by Metropolis criterion. ICM performed satisfactorily in
creating protein–ligand buildings for 68 diverse, high-resolution X-ray buildings
present in DUD.
References
1. S.A. Adcock, J.A. McCammon, Molecular dynamics: survey of methods for simulating the
activity of proteins. Chem. Rev. 106, 1589–1615 (2006)
2. T.E. Cheatham III, P.A. Kollman, Molecular dynamics simulation of nucleic acids. Annu. Rev.
Phys. Chem. 51, 435–471 (2000)
3. M. Karplus, J.A. McCammon, Molecular dynamics simulations of biomolecules. Nat. Struct.
Mol. Biol. 9, 646–652 (2002)
4. T. Simonson, G. Archontis, M. Karplus, Protein–ligand recognition: free energy simulations
come of age. Acc. Chem. Res. 35, 430–437 (2002)
5. H. Longuet-Higgins, B. Widom, A rigid sphere model for the melting of argon. Mol. Phys. 8,
549–556 (1964)
6. A. Rahman, J. Chern, Phys. 45, 2585 (1966).| l3] A. Ralıman, Phys. Rev, 136 405 (1964)
7. J.A. McCammon, B.R. Gelin, M. Karplus, Dynamics of folded proteins. Nature 267, 585–590
(1977)
8. M. Mangoni, D. Roccatano, A. Di Nola, Docking of flexible ligands to flexible receptors in
solution by molecular dynamics simulation, Proteins: Structure. Funct. Bioinform. 35, 153–162
(1999)
9. M.R. Landon, R.E. Amaro, R. Baron, C.H. Ngan, D. Ozonoff, J. Andrew McCammon, S. Vajda,
Novel druggable hot spots in avian influenza neuraminidase H5N1 revealed by computational
solvent mapping of a reduced and representative receptor ensemble. Chem. Biol. Drug Des.
71, 106–116 (2008)
10. M. Liu, S. Wang, MCDOCK: a Monte Carlo simulation approach to the molecular docking
problem. J. Comput. Aided Mol. Des. 13, 435–451 (1999)
Chapter 6
Receptor Thermodynamics
of Ligand–Receptor or Ligand–Enzyme
Association
6.1 Introduction
The aim of both qualitative and quantitative approaches is to determine or predict the
mode of binding, selectivity and binding free energy that is associated with the pro-
tein–ligand interactions. These computational methods can be efficiently employed
to assess the factors determining the binding process, such as specific interactions
contributing to protein–ligand recognition. Based on a qualitative or quantitative
manner, binding free energy methods (and the associated current challenges to their
application) form the underlying common motives.
In this chapter, a general distinction is made that divides the various computational
methods into two categories. (1) A structure-based, qualitative assessment of pro-
tein–ligand interactions governing the binding process. (2) A quantitative assessment
of the binding affinity of ligands for protein targets. Note that this is a simplifica-
tion and overlap may occur between the two categories. Contributions of represen-
tative computational methods (docking, quantitative structure–activity relationship
A D E
Active sites prediction and grid generation Prepare compounds Drug Kinetics Simulation
Clean Structures
Receptor based Virtual Screening of Target Investigate Drug effect
from various kind of Database
Generate Conformers
HTVS Docking
Create Pharmacophore Site
F
SP Docking Common Pharmacophore
Hypotheses MD Simulations
Search
XP Docking
3D Database of Target
Lead Compound
Screened Compounds
Ligand Scoring
B C
Screened Compounds
Validation-Blind Docking Validation-Induced Fit Docking
Blind Docking for cross validation of Induced Fit docking compounds analysis
active site prediction
The concept that therapeutic agents produce their selective action in modifying
disease symptoms by acting as ‘magic bullets’ at discrete molecular targets within the
body, is generally attributed to Paul Ehrlich during the turn of the nineteenth century
as a part of seminal ‘lock and key’ hypothesis. This hypothesis has described drugs
as receptor’s ligands or enzyme substrates that selectively modulate the function
of unknown molecular targets to produce beneficial effects. The receptor theory
involves, to a very major extent, the classical enzyme kinetic model based on the law
of mass action and derived by Michaelis and Menten in 1913 [1]. The interaction
between receptor and a ligand can be looked upon as
The ligand L binds to the receptor R and alters the nature of receptor interac-
tion with its associated membrane components to induce a change in the cellular
and ultimately, tissue function. Ligands interacting with receptors have two intrinsic
properties: Affinity and Efficacy. Affinity is the ability to recognize and binds to the
receptor while ability of the ligand to bring change in the cellular processes via acti-
vation of transmembrane transduction mechanisms involving G-protein complexes
or ion channels is defined as efficacy. In addition to the affinity of a receptor for its
ligand, the response to the ligand is also dependent on the number of receptors. An
additional ligand property is selectivity that is defined as the degree to which the
ligand interacts with the target of interest in comparison to related structural targets.
The degree of selectivity typically determines the side effect profile of the new com-
pound, given that the targeted mechanism itself does not produce untoward effects
when stimulated beyond the therapeutic range. Ligands may be either agonists or
antagonists. Agonists have intrinsic efficacy and their binding to the receptor leads to
activation of intracellular components involved in the physiological or pharmacolog-
ical responsiveness of cell or tissue. This efficacy may be manifested by changes in
the activity of an enzyme like adenylate cyclase or by an alteration in the contractile
response of an isolated, intact tissue preparation. However, antagonists bind to the
receptor and block the interaction of agonist while producing no effect on the tissue
on their own. Antagonism can be of several types: competitive, non-competitive and
inverse [2]. Competitive antagonism is usually associated with ligands that directly
interact with the agonist binding site, i.e. recognition element of the receptor. The
non-competitive or uncompetitive antagonists interact at sites distinct from the ago-
nist recognition site and can modulate agonist binding. A third class of ligand is that
of inverse agonist. Ligands of this class interact with a defined recognition site on a
receptor and are not only able to block the effects of an agonist at the receptor but
also able to produce effects opposite to that of agonist at varying degrees. Hence,
a biological response is produced by the interaction of a drug with the biological
receptor. This selective binding and its extent are governed by the molecular recog-
nition phenomenon. In molecular modelling, this process of molecular recognition
is simulated to understand the drug–receptor interaction (this equation means ligand
binds with the receptor).
40 6 Receptor Thermodynamics of Ligand–Receptor or Ligand–Enzyme …
L1
Ligand + Receptor L-R Complex → Response (6.2)
L2
The rate constant for association of the complex is L 1 , the rate constant for the
dissociation of complex is L 2 and the affinity or association constant (L as ) can be
expressed as
L as L 1 /L 2 .
The thermodynamic parameters of interest for the above reactions are standard
free energy (G0 ), enthalpy (H 0 ) and entropy (S 0 ) of association. These parameters
are related to the Gibbs free energy equation,
G 0 −RT ln L as (6.3)
G H − T S0
0 0
(6.4)
The most fundamental forces involved in the interaction of ligand and recep-
tor is covalent, reinforced ionic, ionic, ion–dipole, dipole–dipole, van der Waals
and hydrophobic forces. In molecular modelling, every effort is made to measure
the free energy of association (G). Various computational chemistry methods and
assumptions are adopted to arrive at a measure of association [3].
The pharmacophores obtained from similarity analysis and 3D-QSAR analysis can
be used to search the compounds from a database holding similar features are defined
in the pharmacophores. Whereas QSAR focuses on a set of descriptors like electro-
static and thermodynamic properties while pharmacophore mapping is a geomet-
ric approach. There are various programs like UNITY, CATALYST, MENTHOR,
MACCS-3D CAVEAT that converts these pharmacophores into search queries. Var-
ious databases available commercially are Comprehensive Medicinal Chemistry-
3D (CMC-3D), Fine Chemicals Directory-3D (FCD-3D), National Cancer Institute
(NCI), Maybridge, Derwent World Drug Index, BioByte, etc. These search queries
can be combined with ORACLE program to perform the rational database search to
conclude the potential molecule with drug-like properties.
6.2 Database Searching 41
Together with the continuous increase in computer power and advances in related
areas of statistical mechanics and enhanced sampling techniques, binding free energy
calculations have become useful tools in drug design and in the rationalization of
biophysical experiments. This has been also reflected from the relative increase in
number of scientific reports over the past years on this topic. In structure-based drug
design, free energy calculations are often applied in the context of a thermodynamic
cycle approach combining the so-called alchemical transformations between struc-
turally related compounds. This has been proven as a successful tool to guide the drug
development. Since it is virtually infeasible to run some molecular dynamics simula-
tions long enough to thoroughly capture the ligand–protein association/dissociation
equilibrium, calculation of absolute free energy differences, associated with ligand
binding (Gbind ) mostly remains outside the range of computational chemistry. Alter-
natively, the absolute binding free energy may also be calculated from alchemical
approaches, vanishing a ligand from the protein active site and from an aqueous solu-
tion. Note that the term absolute binding free energy, commonly used in the field,
still refers to free energy differences along the binding process. However, in the drug
development process, the main interest is typically to determine the affinities of a
series of potential drug candidates relative to each other. Therefore, the focus usually
lies on the calculation of relative binding free energies (Gbind ) between (series
of) compounds or ligands. The use of thermodynamic cycles involving alchemical
transformations between two ligands (L1 and L2 ) is to calculate the Gbind and
a given protein target (P) in aqueous solution. The free energy is a thermodynamic
state function and it is a path-independent quantity. This reflected that the order of
42 6 Receptor Thermodynamics of Ligand–Receptor or Ligand–Enzyme …
Fig. 6.2 Standard thermodynamic cycle for relative binding free energy calculations. To compute
the relative binding free energy of two ligands (L 1 in L 2 ) for a given protein (P), L 1 is alchemically
mutated to L 2 while both are in aqueous solution and in the protein environment. According to
Eq. (6.1), Gbind is derived by relating the difference between G1 and G2 to the difference
between G3 and G4
binding event does not matter and the computed free energy only depends on the
representation of initial (ligand in solution) and final (ligand bound to protein) state
of the binding process.
Therefore, the free energy changes along the cycle in Fig. 6.2 sums to zero, so
that Gbind can be expressed as: Gbind = G2 − G1 = G4 − G3 , which
relates the free energy difference of the two horizontal branches (G1 and G2 ).
This indicates the individual affinities of the ligands for the protein while free energy
difference for the vertical branches (G3 and G4 ) that correspond to non-physical
alchemical transformation of L 1 in L 2 for the bound and free state, respectively. The
use of thermodynamic cycles is a standard approach to calculate the relative binding
free energies. However, note that the thermodynamic cycle approach (and calculation
of alchemical free energy differences) can also be applied to calculate the free energy
changes of different types of (bio)chemical events other than ligand binding, such as
protein folding, solvation or conformational changes.
Ultimately, the challenge lies in the development of more robust and efficient free
energy calculations to reduce the computational cost and thus, makes this approach
more feasible for the large-scale industrial applications.
References
1. K.A. Johnson, R.S. Goody, The original Michaelis constant: translation of the 1913 Michaelis-
Menten paper. Biochemistry 50, 8264–8269 (2011)
2. T. Albers, tures?, in Protein Structure, Folding and Design: GENEX-UCLA Symposium, Vol.
39, ed. by D. L. Oxender (Allan R. Liss, New York, pp. 283–289) Alt, J, vol. 113, p. 125
3. J.K. Seydel, Sulfonamides, structure-activity relationship, and mode of action. Structural prob-
lems of the antibacterial action of 4-aminobenzoic acid (PABA) antagonists. J. Pharm. Sci. 57,
1455–1478 (1968)
Chapter 7
Thermodynamic Cycles and Their
Application in Protein Targets
Abstract A key part of drug design and development is the optimization of molecu-
lar interactions between an engineered drug candidate and its binding target. Thermo-
dynamic characterization provides information about the balance of energetic forces
driving binding interactions and is essential for understanding and optimizing molec-
ular interactions. Comprehensive thermodynamic evaluation is vital in the drug devel-
opment process to speed drug development towards an optimal energetic interaction
profile while retaining good pharmacological properties. Practical thermodynamic
approaches, such as enthalpic optimization, thermodynamic optimization plots and
the enthalpic efficiency index, have now been developed to provide proven utility in
design process. Improved throughput in calorimetric methods remains essential for
even greater integration of thermodynamics into drug design.
7.1 Introduction
Thermodynamics has found increasing adoption in the drug design and development
process in both academic and commercial endeavours and is increasingly prevalent
alongside longer standing structure- and molecular modelling-based approaches. The
integration of thermodynamic measurements has grown with a better understand-
ing of energetic data, the increasing demonstration of the utility and application of
these measurements, and the availability of ever-improving instrumentation. How-
ever, there is still much that is not understood about the basis of binding interactions
and how these can be interpreted from thermodynamic data. Advances in instrumen-
tation have increased throughput and reduced sample demands, but still only offer
moderate throughput for a drug discovery effort that demands much higher. Despite
these limitations, useful practical approaches have been developed and advances
are being made that present a bright future for thermodynamics in drug design and
development.
Historically, rational drug design has been based upon seeking structural com-
plementarity and optimizing binding contacts between an engineered drug and a
target binding site to generate lead compounds [1]. Of course, drug design is part
of a bigger picture involving consideration and optimization of solubility, selectiv-
ity, ADMET (absorption, distribution, metabolism, excretion and toxicology) and
pharmacokinetic/pharmacodynamic properties, but rational design and engineering
of ligands for molecular recognition of a given target is the core of the process. In
the past, drug designing involved utilization of structural information of the target
site concluded by X-ray crystallography and NMR alongside molecular modelling
of drug–target interactions. Drug development was driven by the goal of optimiz-
ing molecular recognition, seeking high affinity compounds that were considered to
possess optimal binding interactions. However, a purely structure-based approach is
incomplete, and it is essential to incorporate complementary approaches to under-
stand the driving forces underlying the molecular interactions of the binding process
[2]. Approaches that are solely based on the structural data are often sought by
binding affinity optimization, which provides an oversimplified picture of molecular
interactions with isostructural complexes. Similar binding affinities potentially hide
the disparate binding thermodynamics and revealed only one part of the binding
picture.
Water molecules can be of considerable importance for the binding and selectivity of a
substrate to its receptor, for instance water-mediated hydrogen bonds between protein
and ligand. The bacterial oligopeptide-binding protein A of Salmonella typhimurium
(OppA) is a well-studied example for which water molecules have a profound effect
on ligand binding. OppA binds with small peptides of 3–5 residues regardless of their
amino acid sequence. Whereas other proteins need water molecules to establish high
selectivity in the ligand binding, OppA relies on water molecules to accommodate a
broad range of ligands with diverse physicochemical properties. This lack of speci-
ficity is due to most of interactions between OppA and peptide ligands being mediated
by water, thus stabilizing the positive and negative charges or dipole moments of the
ligand side chains. For instance, crystal structure of charged tripeptide Lys-Glu-Lys
(KEK) in complex with OppA (PDB code 1JEU), showed that the ligand is buried in
the active site, and that most of the interactions between KEK and OppA are mediated
by nine water molecules. For different tripeptides, diverse water configurations have
been observed in the active site, as well as dissimilar numbers of water molecules.
The challenges associated with the simulation of highly flexible peptidic ligands,
combined with the presence of water molecule networks in the active site pocket
are addressed, in which thermodynamic cycles were constructed for three different
peptides binding to OppA.
References
Abstract Current functional genomics relies on known and characterised genes, but
despite significant efforts in the field of genome annotation, accurate identification
and elucidation of protein coding gene structures remains challenging. Methods
are limited to computational predictions and transcript-level experimental evidence;
hence translation cannot be verified. Proteomic mass spectrometry is a method that
enables sequencing of gene product fragments, enabling the validation and refinement
of existing gene annotation as well as the elucidation of novel protein coding regions.
However, the application of proteomics data to genome annotation is hindered by the
lack of suitable tools and methods to achieve automatic data processing and genome
mapping at high accuracy and throughput.
8.1 Introduction
Mass spectrometry (MS) has become the method of choice for protein identification
and quantification [1, 2]. The main reasons for this success include the availabil-
ity of high-throughput technology coupled with high sensitivity, specificity and a
good dynamic range [3]. These advantages are achieved by various separation tech-
niques coupled with high performance MS instrumentation. In a modern bottom-up
LC-MS/MS proteomics experiment [4], a complex protein mixture is often sepa-
rated via gel electrophoresis first to simplify the sample [5]. Subsequently, proteins
are digested with a specific enzyme such as trypsin, generating peptides that are
amenable for subsequent MS analysis. To further reduce sample complexity, pep-
tides are separated by liquid chromatographic (LC) systems [6], allowing direct
analysis without the need for further fractionation: eluents are ionised, separated by
their mass over charge ratios and subsequently registered by the detector. In a tan-
dem MS experiment (MS/MS), low energy collision-induced dissociation is used to
fragment the precursor ions, usually along the peptide bonds. Product fragments are
Fig. 8.1 Schematic of a generic bottom-up proteomics MS experiment. a Sample preparation and
fractionation, b protein separation via gel-electrophoresis, c protein extraction, d enzymatic protein
digestion, e separation of peptides in one or multiple steps of liquid chromatography, followed by
ionisation of eluents and f tandem mass spectrometry analysis
measured as mass over charge ratios, which commonly reflect the primary structure
of the peptide ion [7]. This simplified process is illustrated in Fig. 8.1.
Today this technology allows researchers to identify complex protein mixtures
and enables them to build protein expression landscapes of any biological material
[8]. However, protein sequence coverage varies largely [3, 9] while protein inference
can be challenging if identified sequences are shared between different proteins
[10, 11]. The alternative top-down MS approach allows us to identify and sequence
intact proteins directly and does not limit the analysis to the fraction of detectable
enzyme digests [12]. However, this method is currently not applicable to complex
protein samples in a high throughput fashion. Firstly, there is an insufficiency of
efficient whole protein separation techniques and secondly commercially available
MS instruments are either limited by efficient fragmentation or by molecular weight
restrictions of the analytes [13]. Proteins directly and does not limit the analysis to the
fraction of detectable enzyme digests [12, 14]. However, this method is currently not
applicable to complex protein samples in a high throughput fashion. Firstly, there
is an insufficiency of efficient whole protein separation techniques and secondly
commercially available MS instruments are either limited by efficient fragmentation
or by molecular weight restrictions of the analyses [13].
Many computational tools have been developed to support high throughput peptide
and protein identification by automatically assigning sequences to tandem MS
spectra [15] shown in Fig. 8.1. Three types of approaches are used: (a) de novo
sequencing (b) database searching and (c) hybrid approaches.
8.3 De Novo and Hybrid Algorithms 49
De novo algorithms infer the primary sequence directly from the MS/MS spectrum by
matching the mass differences between peaks to the masses of corresponding amino
acids [16]. These algorithms do not need a priori sequence information and hence
can potentially identify protein sequences that are not available in a protein database.
However, de novo implementations do not yet reach the overall performance of
database search algorithms and often only a part of the whole peptide sequence
is reliably identified [17–19]. High accuracy mass spectrometry circumvents many
sequence ambiguities, and de novo methods can reach new levels of performance
[20]. Moreover, hybrid algorithms become more important, which build upon the de
novo algorithms, but compare the generated lists of potential peptides [21] or short
sequence tags [22] with available protein sequence databases to limit and refine the
search results. With the constant advances in instrument technology and improved
algorithms, de novo and hybrid methods may have a more important role in the
future, however database searching remains the most widely used method for peptide
identification.
Fig. 8.2 Concept of sequence database searching resembles a generic bottom-up MS experiment,
as for each stage of the experiment, an in silico equivalent component is available
measures and need to deal with post-processing software that converts search scores
into meaningful statistical measures. Therefore, the following sections are focussed
on scoring and assessment of database search results, providing a brief overview of
common methods, their advantages and disadvantages.
Sequest [24] was the first sequence database search algorithm for tandem MS data
and is today, together with Mascot [26] one of the most widely used tools for pep-
tide and protein identification. These are representative of the numerous database
search algorithms that report for every PSM, a score that reflects the quality of the
cross correlation between the experimental and the computed theoretical peptide
spectrum. Although Sequest and Mascot scores are fundamentally different in their
8.6 Peptide-Spectrum Match Scores and Common Thresholds 51
calculation, they facilitate good relative PSM ranking: all peptide candidates that
were matched against an experimental spectrum are ranked according to the PSM
score and only the best matches are reported. Often only the top hit is considered
for further investigation and some search engines [27] exclusively report that very
best match. However, not all these identifications are correct. Sorting all top hit
PSMs (absolute ranking) according to their score enables the selective investigation
of the very best matched PSMs. This approach was initially used to aid manual
interpretation and validation. As the field of MS-based proteomics moved towards
high-throughput methods, researchers started to define empirical score thresholds.
PSMs scoring above these thresholds were accepted and assumed to be correct, while
anything else was classified as incorrect. Depending on how well the underlying PSM
score discriminates, the correct and incorrect scores overlap significantly (Fig. 8.3)
and therefore thresholding is always a trade-off between sensitivity (fraction of true
positive identifications) and the acceptable error rate (fraction of incorrect identifi-
cations). Low score thresholds will accept more PSMs at the cost of a higher error
rate and on the other hand a high score threshold reduces the error rate at the cost of
sensitivity.
Many groups also apply heuristic rules that combine the score threshold with
some other validation properties such as charge state, the difference in score to the
second-best hit, amongst others. The problem with these methods is that the actual
error rate remains unknown and the decision of accepting assignments is only based
on judgement of an expert. Moreover, results between laboratories or even between
experiments cannot be reliably compared, since different search algorithms, pro-
tein databases, search parameters, instrumentation and sample complexity require
adaptation of acceptance criteria. A recent HUPO study [28] investigated the repro-
ducibility between laboratories. Amongst the 18 laboratories, each had their own
criteria of what was considered a high and low confidence protein identification,
which were mostly based on simple heuristic rules and score thresholds [28]. It was
found that the number of high confidence assignments between two different labo-
ratories could vary by as much as 50%, despite being based on the same data. As
a result, many proteomic journals require the validation and assessment of score
thresholds, ideally with significance measures such as genome annotation.
The genomic sequence encodes the blueprint of an organism. The instruction sets are
encoded in protein coding and non-coding genes, which are dined stretches of DNA
sequence that contain the information required to construct proteins and functional
RNA molecules respectively. The realisation of genes is initiated by transcription,
whereby genomic DNA is transcribed into RNA.
52 8 Genomics and Proteomics Using Computational Biology
Fig. 8.3 Illustration of gene transcription and translation according to the standard model
Sequencing starts in the last decade generated a large amount of raw genomic DNA
sequence data. To date there are 118 complete eukaryotic genomes sequenced [34]
and more sophisticated sequencing technologies will even speed up this data collec-
tion process. A project to sequence 10,000 vertebrate species has just been proposed,
even though technology is not yet up to it [35]. Genomes can be large, for example
the human genome comprises approximately 3.2 × 109 base pairs, yet only about
1–2% of its DNA codes for proteins [36].
Genome annotation can be defined as augmenting these raw DNA sequences with
additional layers of information [37, 38]. It can be distinguished between structural
and functional annotation. The former is the process of identifying important genomic
elements such as genes, the precise localisation of genes within the genome and
the elucidation of exon/intron structures, while the latter deals with the biological
function, regulation and expression analysis of these elements. For clarification,
when the term “genome annotation” is used in the remainder of this work, it refers
to structural annotation only. The task of accurately annotating the complete set
of protein coding genes and their alternative splice forms is considered one of the
hardest and yet most important steps towards understanding a genome, since proteins
are central to virtually every biological process in a cell. However, the difficulty of
gene identification and gene structure elucidation is determined by the complexity
of the underlying genome: for example, identification of ORFs in bacteria, which are
not discussed in this work, is relatively easy due to the lack of alternative splicing
and a compact genome; simpler eukaryotes, such as yeast with limited splicing and
short intronic regions are much easier to annotate than vertebrates, since extensive
alternative splicing, long introns and intergenic regions further complicate sensitive
and specific annotation.
Fig. 8.4 Overview of the different gene-finding strategies. Figure was adapted from Harrow et al.
2009
8.11 Proteogenomics
The automatic Ensembl pipeline and the HAVANA manual curation pipeline incor-
porate protein data from the UniProtKB database [39], where more than 99% of
the protein sequences are derived from genomic translations and cDNA sequences,
but only 13% are supported by protein level evidence such as mass spectrometry
identification (UniProt release notes 15.11, http://www.uniprot.org/news/2009/11/
24/release). Proteins that are detected [40] demonstrated the concept of searching
MS/MS data directly against a six-frame translation of the genome, but it was [41–43]
that applied this approach to eukaryotic genomes with the purpose of validating and
refining gene annotation as well as the identification of novel genes. In these studies,
a six-frame translation was used as a search database, however in higher eukaryotes
this is problematic: only 1–2% of the human genome encodes proteins [30, 36],
therefore most of the six-frame translation is essentially random sequence. The Pep-
tide Atlas project [44, 45], the first large-scale proteogenomic pipeline and MS/MS
peak lists and raw data repository, employs the standard International Protein Index
(IPI) database as an alternative approach to six-frame translation. IPI provides a min-
imally redundant yet maximally complete sets of protein sequences from Ensembl,
Vega, RefSeq and UniProtKB. Later versions of Peptide Atlas complement the IPI
database with protein isoforms from Ensembl. Peptide Atlas comprises an analysis
pipeline to processes MS data with Sequest and PeptideProphet and provides access
to these peptide identifications, which are persisted in a comprehensive relational
8.11 Proteogenomics 55
References
40. J.R. Yates III, J.K. Eng, A.L. McCormack, Mining genomes: correlating tandem mass spectra
of modified and unmodified peptides to sequences in nucleotide databases. Anal. Chem. 67,
3202–3210 (1995)
41. J.S. Andersen, M. Mann, Mass spectrometry allows direct identification of proteins in large
genomes. Proteomics 1 641g650 (2001)
42. J.S. Choudhary, W.P. Blackstock, D.M. Creasy, J.S. Cottrell, Matching peptide mass spectra
to EST and genomic DNA databases. Trends Biotechnol. 19, 17–22 (2001)
43. J.S. Choudhary, W.P. Blackstock, D.M. Creasy, J.S. Cottrell, Interrogating the human genome
using uninterpreted mass spectrometry data. Proteomics 1, 651–667 (2001)
44. F. Desiere, E.W. Deutsch, A.I. Nesvizhskii, P. Mallick, N.L. King, J.K. Eng, A. Aderem, R.
Boyle, E. Brunner, S. Donohoe, Integration with the human genome of peptide sequences
obtained by high-throughput mass spectrometry. Genome Biol. 6, R9 (2004)
45. F. Desiere, E.W. Deutsch, N.L. King, A.I. Nesvizhskii, P. Mallick, J. Eng, S. Chen, J. Eddes,
S.N. Loevenich, R. Aebersold, The peptideatlas project. Nucleic Acids Res. 34, D655–D658
(2006)
46. S.F. Altschul, W. Gish, W. Miller, E.W. Myers, D.J. Lipman, Basic local alignment search tool.
J. Mol. Biol. 215, 403–410 (1990)