Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Combinatorial Chemistry & High Throughput Screening, 2011, 14, 889-897 889

Integration of Virtual and High Throughput Screening in Lead Discovery


Settings
Tímea Polgár1 and György M. Keser*,1,2

1
Budapest University of Technology and Economics, Szt. Gellért tér 4, 1111 Budapest, Hungary
2
Gedeon Richter Plc, P.O. Box 27, H-1475 Budapest, Hungary

Abstract: In the last decade mass screening strategies became the main source of leads in drug discovery settings.
Although high throughput (HTS) and virtual screening (VS) realize the same concept the different nature of these lead
discovery strategies (experimental vs theoretical) results that they are typically applied separately. The majority of drug
leads are still identified by hit-to-lead optimization of screening hits. Structural information on the target as well as on
bound ligands, however, make structure-based and ligand-based virtual screening available for the identification of
alternative chemical starting points. Although, the two techniques have rarely been used together on the same target, here
we review the existing prominent studies on their true integration. Various approaches have been shown to apply the
combination of HTS and VS and to better use them in lead generation. Although several attempts on their integration have
only been considered at a conceptual level, there are numerous applications underlining its relevance that early-stage
pharmaceutical drug research could benefit from a combined approach.
Keywords: Virtual screening, high throughput screening, integration, parallel screening, focused screening, sequential
screening.

INTRODUCTION optimization of the protocols used for large scale prediction


of biological affinity. In addition, HTS and VS have another
Identification of suitable chemical starting points and
common conceptual framework; both methods have limited
viable leads is one of the key tasks of early phase drug
accuracy that is compensated by the number of compounds
discovery. Historically, lead discovery was mainly supported investigated. VS and HTS can therefore be considered as
by in vivo experiments usually resulting in compounds with
classification techniques that separate actives from inactives
acceptable efficacy and suitable pharmacokinetic profile in
rather than identifying validated hits. Actives picked up by
animal models. However, market expectations of pharma
HTS or VS should therefore be experimentally validated
investors facilitated the application of other approaches such
before starting hit-to-lead process. Therefore, HTS and VS
as in vitro screening or computer-aided drug discovery
are not considered as competitive disciplines. However in
(CADD) methods [1]. reality, HTS and VS methods are still less integrated than it
According to a recent estimation experimental screening is expected. In addition to organizational and disciplinary
methods together provide 65% of drug leads and account for aspects one of the reasons can be cultural based on the
approximately 14% of the preclinical budget [2]. In vitro different viewpoints of experimentalists and theoreticians.
assays serve the basis of structure-activity relationships that During the last decade, various in silico methods were
drive medicinal chemistry during active-to-hit and hit-to-lead applied for the analysis of large screening datasets and for
processes. Dramatic developments in molecular biology, the knowledge extraction. Moreover predictive computat-
detection methods, computer technology, and automation ional models were attempted to build on the extracted
and miniaturization made HTS a characteristic tool of lead knowledge aiming at using the two complementary methods
discovery. For today, it is clear that library design and well- in practice. In addition to integrated VS and HTS solutions,
designed in vitro assays lead to higher hit rates reflecting a these methods, of which intended use is to improve hit rates
current trend to ‘re-rationalize’ drug discovery research. and accelerate pharmaceutical research are discussed in this
Random screening should therefore be replaced by review.
methodologies that combine the capacity of HTS approaches
and the rational basis of CADD techniques [3]. VIRTUAL SCREENING
Application of the HTS technology requires the selection Virtual screening (VS) has emerged as an inexpensive
of the target, the development of the assay and the and straightforward method for identifying primary actives
availability of the screening library in a suitable format. In or even lead molecules. It can be either realized as structure-
this respect virtual screening techniques can be considered as based screening approach (also called docking) or a ligand-
in silico analogues of in vitro HTS technologies. After based one considering chemical structure similarity of small
defining the target, VS also requires the development and molecules during database screening. Here we restrict
ourselves to discuss only the most relevant methods in
virtual screening (Fig. 1) and point to recent reviews as
*Address correspondence to this author at the Gedeon Richter Plc, P.O. Box
27, H-1475 Budapest, Hungary; Tel: +36-1-4314605; Fax: +36-1-4326002; references.
E-mail: gy.keseru@richter.hu

1386-2073/11 $58.00+.00 © 2011 Bentham Science Publishers


890 Combinatorial Chemistry & High Throughput Screening, 2011, Vol. 14, No. 10 Polgár and Keser

docking
N O
HN CH
3

N
N
Cl
Cl

Cl

2D 3D

Trp255
Tyr275 Asp366-Lys192
Val196
A
Phe278 Phe170
N
N
Trp279
C D E
Leu387
B
Met384
Trp356

Phe200

2D fingerprint substructure
3D pharmacophore
shape similarity
phramacophore fingerprint

Fig. (1). Schematic representation of Virtual Screening Approaches.

Database filtering. One of the best known early or connectivity pattern and pharmacophore, can encode
physicochemical filters, the rule of five (ROF) was descriptor value ranges or binary transformed descriptors.
introduced by Lipinski et al. [4]. It is based on the Analysis of molecular similarity is based on the quantitative
distributions of easily accessible and interpretable determination of the overlap between fingerprints of the
physicochemical properties of known oral drugs. These query structure and all database members. Since descriptors
properties define the so-called drug-like chemistry space of a given molecule can be considered as a vector of real or
where compounds face with no serious ADME (absorption, binary attributes most of the similarity measures are derived
distribution, metabolism, and elimination) problems. Later, as vectorial distances. If we consider small molecules as
Hann [5] and Oprea [6] introduced the concept of lead- spatial objects virtual screening can also be performed using
likeness after analysing lead molecules and corresponding 3D queries. 3D similarity searches utilize 3D information
drugs. Comparative studies on leads and drugs revealed including shape, steric and electrostatic properties obtained
lower complexity of lead molecules. Nevertheless, it should for the query molecule and the database screened. High
be emphasized that limits applied for both drug-like and throughput molecular alignment techniques of this kind
lead-like compounds are based on statistical analyses on superpose all database entries onto the query molecule.
known examples. Although these simple filters demonstrated FlexS, one of the most popular approaches, keeps the query
significant classification accuracy when discriminating drugs rigid and considers test molecules as flexible when offering
and non-drugs, leads and drugs their nature is mainly several alternative superpositions each scored and ranked by
empirical. Unstable, reactive, toxic or otherwise unsuitable similarity [11, 12]. ROCS and ChemAxon’s 3D performs a
compounds can simply be removed using functional group shape based superposition of the molecular volume
filters [7, 8]. combining 2D and 3D similarity scoring [13, 14].
Pharmacophore is an arrangement of steric and electrostatic
Ligand-based screening [9]. Similarity searching is one
features in the three dimensional space that are crucial for
of the simplest methodologies of ligand-based virtual
screening when screening databases against a query using the biological action. Structure-based pharmacophores are
typically explored by analyzing binding site interactions
chemical similarity principles. During a similarity search the
formed between ligands and protein. Without having
query molecule is compared to members of a database and a
structural information ligand-based pharmacophore can be
measure is calculated quantifying the similarity between the
developed using a set of active compounds. Pharmacophore
query and each molecule of the database [10]. For 2D
searches use at least one conformation of database
similarity searches the query structure and members of the
screened database are typically represented by molecular compounds that is expected to be similar to the bioactive
conformation.
fingerprints that encode molecular structure and properties in
binary format (Fig. 1). A given number of bits in the bit Structure-based virtual screening [15, 16]. Functions of
string detect the presence or absence of molecular fragments drug molecules and protein targets are regulated by the
Integration of Virtual and High Throughput Screening Combinatorial Chemistry & High Throughput Screening, 2011, Vol. 14, No. 10 891

principles of molecular recognition. The rational drug design compound series. In terms of definition, high throughput
requires the understanding of molecular recognition in terms screening can be considered the process in which batches of
of structure and energetics [17] that could be compounds are tested for binding affinity or functional
straightforwardly realized by structure-based virtual activity against the target. Test compounds act as inhibitors
screening tools. Structure-based virtual screening starts with of target enzymes, as competitors for binding of a natural
obtaining the coordinates of a protein. X-ray crystallographic ligand to its receptor, as agonists, antagonists or allosteric
or NMR 3D protein structures are used, but homology modulators for receptor-mediated intracellular processes, and
models can also be applied [18]. Within the protein so forth. High throughput screening seeks to screen large
structures active site has to be defined because scanning the numbers of compounds rapidly and in a massively parallel
entire surface of the protein would hardly be feasible with setup [22].
most of the currently used docking algorithms. Another Primary positive high throughput screening results are
factor should also be considered, the conformational
usually called actives. Activity, purity and identity of the
flexibility of the target, as commonly used docking methods
primary actives should first be confirmed and the resulting
are only able to consider flexibility to a limited extent.
hits are then subjected to detailed characterization. After this
Therefore the selection of the target structure is crucial;
second level of triage, hits are optimized in terms of in vitro
binding sites should be as representative as possible,
potency and ADME profile to become lead compounds [23].
mimicking a real and general binding conformation [19].
Moreover, ligand flexibility is also important and considered The pharmaceutical industry currently has a pressing
using one of the two generally applied strategies: whole need for improvement in high throughput screening strategy.
molecule and fragment-based approaches. Historically, the Although the industry has a seemingly insatiable appetite for
DOCK algorithm addressed rigid body docking using a new lead compounds, it is also under continuing pressure to
geometric matching algorithm to superimpose ligand reduce the costs of discovery and development.
conformations onto a negative image of the binding pocket. High throughput screening instruments, assays, and services
While fragment based methods provide an alternative have emerged as a significant growth market. From the field’s
solution for the docking problem; molecules are dissected origins with home-brew tests and generic research
into fragments that can be docked individually, either instrumentation, high throughput screening has become an
separately or incrementally [20]. increasingly sophisticated and important element in the
The interaction between protein and ligand is typically a armamentarium of the drug discoverer. At the beginning of this
reversible equilibrium reaction usually characterised by the century the discovery process seemed to be on a steep growth
free energy of binding. Binding free energy values with curve both with respect to the number of targets and compounds
reasonable accuracy can only be calculated requiring large to be screened and the complexity of the assays required.
computational resources. Evaluation of thousands of ligands Increased numbers of targets and compounds called for greater
requires a computationally cheap estimate of the binding free parallelism and/or increased throughput in screens.
energy. This goal could only be achieved by scoring schemes Furthermore, the expense and scarcity of targets and compounds
that involves force filed-based, empirical and knowledge- have driven the screening technology toward smaller assay
based methods [20]. All fast scoring functions share multiple volumes through miniaturization [24]. At the end of the first
deficiencies. First of all most of them are fitted to or derived decade, however, an opposite trend became evident. Analyzing
from some kind of experimental data. Consequently, the lead generation strategies one can conclude that traditional
functions reflect the accuracy of these measurements. literature-based medicinal chemistry approaches and fragment
Molecular size can influence the scores as well, generally the screening replace resource intensive random HTS campaigns. In
larger the molecule is the better the score is, however addition to the decreasing role of large scale screening in lead
biological measurements do not support this observation. generation recent HTS strategies are focused to screen smaller
Since scoring functions are derived from X-ray structures compound collections with optimized content of chemical
only the favourable interactions are rewarded but information. The quality of the screening deck is of primary
unfavourable interactions are not penalized because importance generating high quality data. Furthermore, focused
information from the crystal structures cannot be obtained. library approaches are used more and more extensively to
Uncertainties in the protonation states and the involvement improve the hit rate of the screening campaigns. It is important
of water in ligand binding complicate scoring further. The to note, that most of the limitations of large scale random
performance of scoring functions can be improved using screening could be overcome by integrating virtual screening
consensus scoring schemes in some cases. The disadvantages approaches. The demonstrated success of virtual screening
and advantages of the various scores might provide a technologies in maximizing the chemical information content of
combination, which is able to describe the main compound libraries, in reducing false positive and false negative
characteristics of the protein-ligand binding [21]. rates of experimental screening and finally in the design of
target family focused libraries are all supporting the integration
HIGH THROUGHPUT SCREENING of virtual and experimental screening.

High throughput screening is a key link in the chain INTEGRATION OF VIRTUAL AND HIGH
comprising the industrialized drug discovery paradigm. THROUGHPUT SCREENING
Today, many pharmaceutical companies are screening
generally hundreds of thousands or more compounds per Integration of virtual chemistry and screening realises
screen to produce typically several hundreds to thousands of how a continuous information exchange between different
hits. On average, some of these might be developed to lead areas could lead to chemical libraries of probes with more
892 Combinatorial Chemistry & High Throughput Screening, 2011, Vol. 14, No. 10 Polgár and Keser

desirable chemical and biological properties (Fig. 2). The be experimentally validated before starting the hit-to-lead
current trend in the pharmaceutical industry is to integrate process. Both high throughput technologies surely miss some
computational and experimental technologies early in the active compounds that are called false negatives. Although
drug discovery process. For instance, it is considered that a reasonable efforts are done to minimize the rate of false
better and earlier utilization of information, that is either positive and negative molecules, false positives and
chemical or biological, would lead to chemical libraries with negatives are yet tolerated in both HTS and VS settings [1,
more desirable chemical and biological properties. This can 25].
be done by integrating information from different areas that
Current computational approaches can manage the ever
include: (i) analysis of the gene or protein family for target
increasing amount of data volumes. Recursive partitioning
selection; (ii) gene-structure-function studies by structural
(RP) is one of the most popular methods when analysing
biology approaches on the target; (iii) generation of virtual large data sets [26]. RP separates data sets and builds
and physical libraries by medicinal chemistry; (iv)
decision trees. Along the decision tree separated clusters are
absorption, distribution, metabolism, excretion and toxicity
enriched with active molecules. There are several ways to
evaluation (ADME/ Tox) of virtual and physical libraries;
assess the enrichment, one method of choice is calculating
(v) VS of virtual and physical libraries and experimental
average values of biological activity at each node. By the end
screening of virtual actives to identify chemical probes.
of the procedure, biological active molecules are
characterized with particular descriptor sets. These descriptor
settings can be applied to search databases for active
target selection compounds [27, 28]. The recently introduced phylogenetic-
like tree algorithm [29], a clustering method that combines
elements of neural nets, genetic algorithms and substructure
structural biology
analysis, addresses these issues by iterative classification of
ADMETox compounds and subclustering. In this case, active
compounds can occur in more than one final cluster and can
be associated with several descriptors, thereby providing
small molecules
diversified structure–activity relationships (SARs).
Labute [30] and coworkers developed a statistical
medicinal chemistry
virtual screening approach to extract knowledge from HTS experiments,
which is called binary QSAR (bQSAR). bQSAR uses Bayes
theorem to builds models for structural descriptors and
molecular properties to predict biological activity. Xue et al.
chemical probe achieved 90% accuracy using this method for the prediction
of oestrogen receptor antagonists [31]. Harper et al. [32]
introduced a binary kernel discrimination approach, and
investigated its performance on two datasets. The first is a
Fig. (2). General strategy for the integration. Abbreviation:
set of 1650 monoamine oxidase inhibitors, and the second a
ADMETox: absorption, distribution, metabolism, excretion and
set of 101437 compounds from an in-house enzyme assay.
toxicity.
They compared the performance of binary kernel
The primary objective of HTS and VS is to derive actives discrimination with a simple procedure which they called
that could be realized by experimental and computational “merged similarity search”, and also with a feed forward
approaches, respectively. The computational strategy applied neural network. Binary kernel discrimination was shown to
in VS suggests that it could be more effective in time, perform robustly with varying quantities of training data and
resource, cost and hit-rates considering only experimentally also in the presence of noisy data. They highlighted the
tested compounds. Moreover it does not require the physical importance of the judicious use of general pattern
availability of the screening library and therefore it could recognition techniques for compound selection [33].
explore a significantly larger part of the drug-like chemistry Depending on the size of the data set, rules explaining
space. In spite of these advantages comparative studies biological data can be determined interactively. The method
demonstrated that these approaches are rather scales linearly with the number of descriptors, so hundreds
complementary than competitive. In one hand, VS of thousands of structures can be analyzed utilizing
techniques could be useful when designing target specific thousands to millions of molecular descriptors. There are
screening libraries. On the other hand, distinct structural currently no methods to deal with statistical analysis
classes identified by VS and HTS are clearly the suggest problems of this size. An important aspect of this analysis is
complementarities between VS and HTS. the ability to deal with mixed datasets. In contrast to most
current quantitative structure/activity relationship (QSAR)
COMPLEMENTARITIES IN DATA ANALYSIS methods requiring compounds with similar binding modes it
could identify SAR rules for classes of compounds in the
Complementary nature of HTS and VS are well used same data set that might be binding in different ways [33].
during HTS data analysis where experimental knowledge is
extracted and used for predictive model building. These Support vector machines (SVMs) are extensively used
models then can further be applied for data mining. Many of for the interpretation of screening datasets [34]. Zernov et al.
the promising actives can never be validated, these are false [33] applied SVM to real-life large-scale drug discovery
positives. Actives identified by HTS or VS have therefore to problems, specifically, the creation of drug- and agro-
Integration of Virtual and High Throughput Screening Combinatorial Chemistry & High Throughput Screening, 2011, Vol. 14, No. 10 893

likeness filters for screening large compound collections. high throughput and virtual screens were dissimilar from
One particular objective was to compare the performance of phosphotyrosine, the canonical substrate group for PTP1B
SVM and artificial neural network (ANN) [35] classifiers on and furthermore the two hit lists were very different from
the same data. It was possible not only to define the main each other. Surprisingly, the docking hits were judged to be
characteristics of derived models but also to reveal a more druglike than the HTS hits. The diversity of both hit
peculiarity relating to the usage of descriptors and definition lists and their dissimilarity from each other underlines that
of the border between active and inactive samples. This docking and HTS may be complementary techniques for lead
peculiarity is critical for deriving adequate classification discovery.
models.
Analysis of chemical clusters identified in VS and HTS
campaigns against glycogen synthase kinase-3 (GSK-3)
COMPLEMENTARITIES IN SCREENING EFFORTS also underpinned the complementarity of the methods [37].
Considering the primary objective of both virtual and GSK-3 is a serine/threonine kinase that has recently
high throughput screening there are at least three different emerged as a key target for neurodegenerative diseases and
strategies of integration including parallel, sequential and diabetes. As an initial step a virtual screen was developed to
focused screening. discriminate known GSK-3 inhibitors and inactive
compounds using FlexX, FlexX-Pharm, and FlexE. The
Parallel screening. The parallel approach realizes the maximal enrichment factor (EF = 28) suggested that the
simultaneous application of both VS and HTS tools on the protocol identified potential GSK-3 inhibitors effectively
very same screening library (Fig. 3). from large compound collections. The effectiveness of the
screening protocol was further investigated by comparative
HTS and VS performed for the same subset of our corporate
library. Enrichment factors, the significantly higher hit rate
of virtual screening (12.9%) than that of the HTS (0.55%),
and also the comparison of active clusters suggest that the
VS HTS virtual screening protocol is an effective tool in GSK-3 -
based library focusing. Head-to-head comparison of
Screening library true/false positives and negatives revealed the two
approaches to be complementary rather than competitive.
Quantitative HTS and structure-based docking of over
VS hits HTS hits 70563 compounds were performed against the antibiotic
resistance target -lactamase, which has been extensively
characterized enzymologically, crystallographically, and for
Analysis
some classes of false-positives in screening. The mechanism
of every qHTS active was established, allowing to determine
which mechanisms were most responsible for artefact hits
and how often putatively reactive functional groups were
problematic. Of the 1274 initial inhibitors, 95% were
detergent-sensitive and were classified as aggregators.
Among the 70 remaining were 25 potent, covalent-acting
Hits
lactams. Mass spectra, counter-screens, and crystallography
identified 12 as promiscuous covalent inhibitors. The
Fig. (3). Parallel screening strategy. remaining 33 were either aggregators or irreproducible. No
Parallel HTS and structure-based VS study performed on specific reversible inhibitors were found. Molecular docking
tyrosine phosphatase-1B inhibitors (PTP1B) – a potential to prioritize molecules from the same library for testing at
type II diabetes target - identified separate sets of hits with higher concentrations was applied afterwards. Of 16 tested, 2
low to mid activity ranges [36]. A corporate sublibrary of were modest inhibitors. Subsequent X-ray structures
approximately 400000 compounds was screened by HTS for corresponded to the docking prediction. These results
compounds that inhibited PTP1B. Concurrently, molecular suggest that when attempting to identify small molecule
docking was used to screen approximately 235000 modulators for a currently “undruggable” or “genomic”
commercially available compounds against the X-ray target for which structural information is available, it may be
crystallographic structure of PTP1B, and 365 high-scoring useful to screen and dock in parallel, using one to help
molecules were tested as inhibitors of the enzyme. Of interpret and guide the other [38].
approximately 400000 molecules tested in the high- Shoichet et al. undertook a parallel docking and HTS
throughput experimental assay, 85 (0.021%) inhibited the screen of 197861 compounds against cruzain, a thiol
enzyme with IC50 values less than 100 μM; the most active protease target for Chagas disease, looking for reversible,
had an IC50 value of 4.2 μM. Of the 365 molecules suggested competitive inhibitors. On workup, 99% of the hits were
by molecular docking, 127 (34.8%) inhibited PTP1B with eliminated as false positives, yielding 146 well-behaved,
IC50 values less than 100 μM; the most active of these had an competitive ligands. These fell into five chemotypes: two
IC50 of 1.7 μM. Considering only wet tested compounds were prioritized by scoring among the top 0.1% of the
structure-based docking therefore enriched the hit rate by docking-ranked library, two were prioritized by behaviour in
1700-fold over random screening. The hits from both the the HTS and by clustering, and one chemotype was
894 Combinatorial Chemistry & High Throughput Screening, 2011, Vol. 14, No. 10 Polgár and Keser

prioritized by both approaches. Determination of an existing structureactivity data on this series, these were
inhibitor/cruzain crystal structure and comparison of the known to be critical for high-affinity binding to 41. The
high-scoring docking hits to experiment illuminated the computational screen identified 12 compounds from a virtual
origins of docking false-negatives and false-positives. library of 8624 molecules as satisfying the model and
Prioritizing molecules that are both predicted by docking and synthetic filters. All of the synthesized compounds tested
are HTS-active yields well-behaved molecules, relatively inhibit 41 association with VCAM-1, with the most
unobscured by the false-positives to which both techniques potent compound having an IC50 of 1 nM, comparable to the
are individually prone [39]. query compound. A 3D QSAR was generated that
rationalizes the variation in activities of these 41
Focused screening. In general CADD approaches are
antagonists. These results demonstrated that it is possible to
sufficient enough to filter out molecules from screening
databases which are incompatible with a particular active site rapidly identify nonpeptidic replacements of peptide
antagonists. This scaffold hopping approach could be useful
or which are not similar enough to known active compounds.
in identification of nonpeptidic inhibitors with improved
Moreover, VS strategies are also capable of prefiltering
pharmacokinetic properties relative to their peptidic
subsets of screening databases that are more likely to bind to
counterparts.
a given active site. Altogether false and true hits provided by
VS leads to the identification of subsets at a manageable size A structure-based virtual screening was conducted on a
of large databases. Focused screening strategies enable the ligand-supported homology model of the human histamine
use of complex low-throughput screening assays like cell- H4 receptor (hH4R). More than 8.7 million 3D structures
based high content screening (HCS) assays. However, in derived from different vendor databases were investigated by
practice a complex compound registration and managing docking to the hH4R binding site using FlexX. A total of 255
system is needed to execute a focused screening approach selected compounds were tested by radioligand binding
because hits are selected and identified individually from assay and 16 of them possessed significant [3H]histamine
plates. Focused screening strategies were shown to be the displacement. Several novel scaffolds were identified that
most frequently used combined methods over the past few can be used to develop selective H4 ligands in the future.
years (Fig. 4). The average hit rate of the hH4R in vitro screening was
6.3%. The hit rates, concerning compounds selected from
“top_2000”, “top_45000”, and “top_45000_analog” sets
VS HTS were 4.6, 7.2, and 4.3%, respectively. It is interesting to note
that “top_45000” selection yielded the highest hit rate.
Moreover, the two most potent compounds were also found
Screening library Focused library Hits by this selection method. These results underline the impact
of hit selection approaches in virtual screening.
“Top_45000_analog” selection that included structural
Hypothesis generation analogues of the compounds selected from the top 45000
compounds resulted in the smallest hit rate as expected.
Although structural cores were identical to those of the
Fig. (4). Focused and sequential screening. original hits altered substitution patterns are probably
Most of the success stories reported in the virtual responsible for differential binding characteristics [41]. This
screening literature [1] follow this strategy when screening study also serves the importance of the integration of these
hundreds of thousands of compounds virtually before testing methods.
the top few dozen experimentally. Since this is the most Sequential screening. The combination of VS-based or
popular and extensively reviewed integration strategy we other filter -based subset selection and HTS in an iterative
discuss here only two case studies; one for ligand-based and way is called sequential screening that demonstrates a real
another for structure based focusing. integrated screening strategy (Fig. 4). Differential sequential
One example of focusing by ligand-based 3D strategies can be assessed based on the information
pharmacophore searching has been used to identify VLA-4 available. When active molecules are known, similarity
integrin antagonists with submicromolar potency [40]. The search methods can be applied to enrich screening library
antigen 41 (very late antigen-4, VLA-4) plays an subsets with molecules having the desired biological activity.
important role in the migration of white blood cells to sites If resulting subsets contains hit molecules with acceptable
of inflammation. It has been implicated in the pathology of a biological activity the subsequent steps will provide
variety of diseases including asthma, multiple sclerosis, and molecules with a better biological activity at a higher
rheumatoid arthritis. A series of potent inhibitors of 41, probability. Several active molecules being available as
that were discovered using computational screening for starting points for the subsequent searching steps can
replacements of the peptide region of an existing tetrapeptide- decrease the size of the screening library because frequency
based 41 inhibitor (4-[N‘-(2-methylphenyl)ureido] phenyl- values can be assigned to candidate molecules based on their
acetyl-Leu-Asp-Val) derived from fibronectin. The search occurrence in the different search steps. In an alternative
query was constructed using a model based on the X-ray way, screening databases can be clustered including or
conformation of the related integrin-binding region of excluding the available information about the hits. When
vascular cell adhesion molecule-1 (VCAM-1). The 3D known ligands are used those clusters are tested in which
search query consisted of the N-terminal cap and the active molecules are found. New hits can then be re-used as
carboxyl side chain of the ligand because, upon the basis of queries in the next clustering step. If there are no known
ligands representative molecules from each cluster can be
Integration of Virtual and High Throughput Screening Combinatorial Chemistry & High Throughput Screening, 2011, Vol. 14, No. 10 895

selected for biological testing. If these clusters includes achieving an overall very significant improvement over
novel hits these can be re-used as query molecules and this random HTS.
process is repeated until necessary. This iterative approach
In an effort to identify novel Janus kinase 3 (JAK3)
has the ability to identify analogues at the very early steps
inhibitors, Chen et al. adopted a sequential focused screening
and provide an initial structure activity relationship that can approach to search in-house chemical database [47].
facilitate medicinal chemistry efforts. Based on the
Sequential screening is unique in that it conducts screening
sequential screening strategy cluster analysis and data
in an iterative fashion. Instead of attempting to find all the
management IT systems were developed [42]. Sequential
desirable hits in a single round of screening, it runs screening
screening is successful if the applied computational filter
in multiple rounds and selects the screening candidates of
enriches the database with active molecules leading to an
each round based on the new structure-activity data coming
increased hit rate in experimental screening. Sequential- out of the previous round of screening. Chen applied this
screening applications have already been applied
strategy into a focused screening project to increase the
successfully on many occasions. Sequential screening based
structural novelty of screening hits. By biologically testing
on RP was successfully used for a set of 14 G-protein-
only 79 selected compounds, they successfully identified 19
coupled receptors. Recursive partitioning, was used by
compounds showing IC50 < 20 μM, with four of them in the
Hertzog et al. [43] to help uncover and understand structure–
nanomolar range. With the advantages of efficiency and
activity relations and to help biology and chemistry experts flexibility, this approach may be utilized to identify leads for
make better decisions on which compounds to screen next
other therapeutic targets. By applying a sequential focused
and better characterize. Fig. (5) shows the sequential-
screening approach, they not only successfully identified a
screening process as applied to 14 G-protein–coupled
couple of novel scaffolds for JAK3 inhibition but also
receptors. The early generation of SAR rules by sequential
achieved these findings with a quite high level of screening
screening can greatly improve the efficiency of the screening
efficiency. As a general strategy for compound screening,
process. this approach can be applied to any chemical databases,
Engels and Venkatarangan used sequential screening in including virtual and non-proprietary databases, greatly
two iterations that resulted in hit rates of up to 40% [44]. reducing potential acquisition and synthesis costs. It can be
Retrospective studies indicate that, by screening from 10 to used in association with various computational and
20% of a collection in several assays, one can find from 50 biological screening techniques with great flexibility,
to 80% of the active compounds [45]. making it a practical tool for current lead identification
processes.
Two-stage sequential screening experiments were carried
out based on HTS data for 18 cancer cell lines of the NCI
[46]. In the first step, hierarchical clustering was applied to COMPLEMENTARITIES AT THE VS-HTS INTERFACE
analyze the screened compound data set consisting of Having discussed a variety of methodological aspects, the
approximately 32000 compounds, and a diversity-based final section describes different types of applications at the
selection was carried out producing a representative subset interface of VS and HTS (other than those already mentioned in
containing 3000 molecules. Testing of these compounds the context of specific methods). Hit expansion and progression
already produced relatively high hit rates between 1% and to leads is well illustrated by a study combining several
2% (dependent on the cell line), thus at least 10-fold higher modelling approaches [48]. Starting from a hit displaying a
than expected from random HTS. In the second round, micromolar activity against the μ-opiate receptor, two
hierarchical clustering was used to expand obtained hits in complementary concepts were applied by Poulain and
the remainder of the screening set and activity-based coworkers for analog design and the results were combined to
selections were carried out, yielding between 250 and 500 produce a lead with sub-nanomolar potency for this receptor
new candidate compounds per cell line. Assays of these [49]. The first method generated analogs of the original hit as
compounds resulted in hits rates between 33% and 43%, thus combinations of topologically similar groups representing

compound
evaluation

initial compound RP SCAP


SAR rules
set analysis
compound screen
with the hypothesis

screen
cherry picking
compound
collection

Fig. (5). Sequential screening strategy applied by Hertzog et al. [43].


896 Combinatorial Chemistry & High Throughput Screening, 2011, Vol. 14, No. 10 Polgár and Keser

different pharmacophore that were present in this hit, thus will provide better solutions to decrease high attrition rates
preserving the overall “skeleton” of the compound. The second during drug discovery. This contribution underpins the view
strategy aimed at identifying analogues displaying that VS and HTS have this potential if integrated. Several
pharmacophore patterns similar to the active compound without examples collected to this review demonstrate that these are
preserving molecular topology. These parallel approaches not competitive but complementary approaches of lead
proved to be complementary in SAR elucidation and hit discovery, and that their integration is likely to become
expansion. fruitful providing viable chemical leads.
As part of a smart screening campaign to target gene Chemoinformatics driven methodologies can also be
families, screening candidates for kinase and GPCR targets applied in a similar way for the reduction of experimental
were selected using neural network simulations and BCUT efforts. In spite of the fact that theoretical approaches have
descriptors [50]. Neural network simulations identified over limitations and include several approximations, success
80% of compounds suitable to target each gene family. In stories mentioned here demonstrate that analysis of VS can
addition, this strategy allowed the identification of compounds influence the success rate of experimental screening.
that were active against a kinase but structurally dissimilar to Presently, this can be well supported by the results of
kinase ligands used in neural network training sets, thus sequential screening, though there are only a limited number
demonstrating that these VS methods were capable of detecting of relevant studies. It is expected that the continuous
compounds having different scaffolds but similar activity. development of VS and HTS technologies will open several
Another neural network application involved a ligand based ways to develop focused and iterative screening
combinatorial design strategy to generate selective purinergic methodologies. Integrated and focused approaches may
receptor (A2) antagonists based on a topological contribute more to lead discovery programs in the close
pharmacophore similarity metric using self-organizing maps future. Bearing this in mind, developing functional interfaces
[51]. On average, so designed ligands displayed a three-fold between theoretical and experimental screening technologies
improvement in their binding constants and 3.5-fold higher appears to be highly validated.
selectivity than the ones in the initial library, thereby illustrating
the value of focusing on small activity-enriched libraries for REFERENCES
screening applications.
[1] Polgár, T.; Keser, G.M. Virtual Screening, Encyclopedia of
An information theory-oriented strategy was established to Pharmaceutical Technology, Pharmaceutical Technology,
interface screening and library design and applied to search for Informaworld Plc, 2006.
cyclin-dependent kinase inhibitors [52]. Following this [2] Bajorath, J. Integration of virtual and high-throughput screening.
Nat. Drug Discov. Rev., 2002, 1, 882-894.
approach, each molecule is considered a “question” and [3] Polgár, T.; Keser, G.M. Structure-based virtual screening.
encoded as a “molecular signature”, a bit string recording the Frontiers in Drug Design & Discovery: Structure-Based Drug
presence or absence of a particular pre-defined pharmacophore Design in the 21st Century, 2007, 3(1), 477-502.
feature, and the assay result is the “answer”. Informative design [4] Lipinski, C.A.; Lombardo, F.; Dominy, B.W.; Feeney, P.J.
Experimental and computational approaches to estimate solubility
selects molecules in order to maximize the difference between and permeability in drug discovery and development settings. Adv.
present and absent chemical features that are responsible for Drug Deliv. Rev., 1997, 23(1-3), 3-25.
important compound characteristics based on assay data. In this [5] Hann, M.M.; Leach, A. R.; Harper, G. Molecular Complexity and
study, informative design using 3D pharmacophores Its Impact on the Probability of Finding Leads for Drug Discovery,
outperformed compound design using 2D substructures, with an Journal of Chemical Information and Computational Science,
2001, 41(3), 856-864.
enrichment factor of 7.7 for pharmacophore-based informative [6] Teague, S.J.; Davis, A.M.; Leeson, P.D.; Oprea, T.I. The design of
design versus 1.5 for substructure-based selection. leadlike combinatorial libraries. Angewandte. Chemistry, Int. Ed.,
1999, 38(24), 3743-3748.
In an anti-bacterial screen of a combinatorial library [7] Walters, P.; Stahl, M.T., Murcko, M.A. Virtual screening - an
containing more than 10000 compounds, a total of 212 active overview. Drug Discov. Today, 1998, 3(4), 160-178.
molecules were detected [53]. However, application of [8] Strategic Pooling of Compounds for High-Throughput Screening,
hierarchical clustering revealed that these actives populated only Mike Hann, Brian Hudson, Xiao Lewell, Rob Lifely, Luke Miller,
and Nigel Ramsden, Journal of Chemical Information and
seven distinct structural classes, thus providing a more realistic Computational Science, 1999, 39(5), 897-902.
measure of true hit rates from HTS, which are usually expected [9] Willett, P. Similarity-based virtual screening using 2D fingerprints.
to be in the range between 0.1% and 1%. Subsequently, 11 Drug Discov. Today, 2006, 11(23-24), 1046-1053.
different nearest neighbour analyses were carried out (in [10] Willett, P.; Barnard, J.M.; Downs, G.M. Chemical similarity
descriptor spaces representing database compounds) based on searching. Journal of Chemical Information and Computer
Sciences, 1998, 38(6), 983-996.
anti-bacterial hits and in each case, between 35 and 66 [11] Cross, S. S. J. Improved FlexX docking using FlexS-determined
compounds were selected for follow-up testing. These assays base fragment placement. J. Chem. Inf. Model., 2005, 45(4), 993-
based on different nearest neighbour selections produced hit 1001.
rates between 3% and 36%, with an average of 14% [54]. [12] Schellhammer, I.; Rarey, M. FlexX-Scan: Fast structure-based
virtual screening PROTEINS: Struct. Funct. Genet., 2004, 57(3),
504-517.
PRESENT APPLICATIONS AND FUTURE POSITION [13] ROCS, © 1997-2010 OpenEye Scientific Software.
[14] 3D alignment, © 1999-2010 ChemAxon Ltd.
For today it became obvious that HTS-based approaches [15] Lyne, P.D. Structure-based virtual screening: an overview. Drug
themselves will not provide as many new chemical entities Discov. Today, 2002, 7(20), 1047-1055.
as it was expected. The combination of the existing [16] Ghosh, S.; Nie, A.; An, J.; Huang, Z. Combinatorial chemistry and
molecular diversity, Structure-based virtual screening of chemical
technologies, however, might contribute to more libraries for drug discovery. Curr. Opin. Chem. Biol., 2006, 10(3),
sophisticated and hopefully more successful strategies that 194-202.
Integration of Virtual and High Throughput Screening Combinatorial Chemistry & High Throughput Screening, 2011, Vol. 14, No. 10 897

[17] Olah, M.M.; Bologa, C.G.; Oprea, T.I. Strategies for compound [38] Babaoglu, K.; Simeonov, A.; Irwin, J.J.; Nelson, M.E.; Feng, B.;
selection. Curr. Drug Discov. Technol., 2004, 1(3), 211-220. Thomas, C.J.; Cancian, L.; Costi, M.P.; Maltby, D.A.; Jadhav, A.;
[18] Charifson, P.S.; Walters, W.P. Filtering databases and chemical Inglese, J.; Austin, C.P.; Shoichet, B.K. Comprehensive
libraries. Mol. Divers., 2000, 5(4), 185-197. Mechanistic Analysis of Hits from High-Throughput and Docking
[19] Hann, M.; Hudson, B.; Lewell, X.; Lifely, R.; Miller, L.; Ramsden, Screens against -Lactamase. Journal of Medicinal Chemistry, 2008,
N. Strategic Pooling of Compounds for High-Throughput 51(8), 2502-2511.
Screening. J. Chem. Inf.Comput. Sci., 1999, 39(5), 897-902. [39] Ferreira, R.S.; Simeonov, A.; Jadhav, A.; Eidam, O.; Mott, B.T.;
[20] Andrews, P.R.; Craik, D.J.; Martin, J.L. Functional group Keiser, M.J.; McKerrow, J.H.; Maloney, D.J.; Irwin, J.J.; K.
contributions to drug-receptor interactions. J. Med. Chem., 1984, Shoichet, B.K. Complementarity between a docking and a high-
27(12), 1648-1657. throughput screen in discovering new Cruzain inhibitors. J. Med.
[21] Muegge, I.; Heald, S.L.; Brittelli, D. Simple selection criteria for Chem., 2010, 53(13), 4891-4905.
drug-like chemical matter. J. Med. Chem., 2001, 44(12), 1841- [40] Singh, J.; van Vlijmen, H.; Liao, Y.; Lee, W-C.; Cornebise, M.;
1846. Harris, M.; Shu, I.; Gill, A., Cuervo, J.H.; Abraham, W.M.; Adams,
[22] Bleicher, K.H.; Böhm, H.J.; Müller, K.; Alanine, A.I. A guide to S.P. Identification of potent and novel 41 antagonists using in
drug discovery: Hit and lead generation: beyond high-throughput silico screening. J. Med. Chem. 2002, 45(14), 2988-2993.
screening. Nat. Rev. Drug Discov., 2003, 2, 369-378. [41] Kiss, R.; Kiss, B.; Könczöl, Á.; Szalai, F.; Jelinek, I.; László, V.;
[23] White, R.E. High-throughput screening in drug metabolism and Noszál, B.; Falus, A.; Keser, G.M. Discovery of novel human
pharmacokinetic support of drug discovery. Ann. Rev. Pharmacol. histamine H4 receptor ligands by large-scale structure-based virtual
Toxicol., 2000, 40(133-157), 133-157. screening. J. Med. Chem., 2008, 51(11), 4891-4905.
[24] Mayr, M.M. The future of high-throughput screening. J. Biomol. [42] Stahura, F.L.; Xue, L.; Godden, J.W.; Bajorath, J. Methods for
Screen., 2008, 13(6), 443-448. compound selection focused on hits and application in drug
[25] Stahura, F.L.; Bajorath, J. Virtual screening methods that discovery. J. Mol. Graph. Model., 2002, 20(6), 439-446.
complement HTS. Comb. Chem. High Throughput Screen., 2004, [43] Jones-Hertzog, D.K.; Mukhopadhyay, P.; Keefer, C.E.; Young,
7(4), 259-269. S.S. Use of recursive partitioning in the sequential screening of G-
[26] Heping, Z.; Burton, H.S. Recursive Partitioning and Applications. protein – coupled receptors. J. Pharmacol. Toxicol. Methods, 2008,
2nd ed., Springer, 2010. 51(2), 3145-3153.
[27] Chen, X.; Rusinko, A; Young, S.S. Recursive partitioning analysis [44] Engels, M.F.M.; Venkatarangan, P. Smart screening: Approaches
of a large structureactivity data set using three-dimensional to efficient HTS. Curr. Opin. Drug Discov. Devel., 2001, 4, 275-
descriptors. J. Chem. Inf. Comp. Sci., 1998, 38(6), 1054-1062. 283.
[28] Rusinko, A.; Farmen, M.W.; Lambert, C.G.; Brown, P.L.; Young, [45] Lewell, X.Q.; Judd, D.B.; Watson, S.P.; Hann, M.M. RECAP-
S.S. Analysis of a large structure/biological activity data set using Retrosynthetic combinatorial analysis procedure: a powerful new
recursive partitioning. J. Chem. Inf. Comp. Sci., 1999, 39(6), 1017- technique for identifying privileged molecular fragments with
1026. useful applications in combinatorial chemistry. J. Chem. Infor.
[29] Tamura, S.Y.; Bacha, P.A.; Gruver, H.S.; Nutt, R.F. Data analysis Comp. Sci., 1998, 38(3), 511-522.
of high-throughput screening results: Application of multidomain [46] Stanton, D.; Morris, T.W.; Roychoudhury, S.; Parker, C.N.
clustering to the NCI anti-HIV data set. J. Med. Chem., 2002, Application of nearest-neighbour and cluster analyses in
45(14), 3082-3093. pharmaceutical lead discovery. J. Chem. Inform. Comp. Sci., 1999,
[30] Labute, P. Binary QSAR: a new method for the determination of 39(1), 21-27.
quantitative structure activity relationships. Pac. Symp. Biocomput., [47] Chen, X.; Wilson, L.J.; Malaviya, R.; Argentieri, R.L., Yang, S.M.
1999, 4, 444-455. Virtual screening to successfully identify novel Janus kinase 3
[31] Xue, L.; Godden, J.; Gao, H.; Bajorath, J. Identification of a inhibitors: A sequential focused screening approach. J. Med.
preferred set of molecular descriptors for compound classification Chem., 2008, 51, 7015-7019.
based on principal component analysis. J. Chem. Inf. Comput. Sci., [48] Good, A.C.; Krystek, S.R.; Mason, J.S. High-throughput and
1999, 39(4), 699-704. virtual screening: core lead discovery technologies move towards
[32] Harper, G.; Bradshaw, J.; Gittins, J.C.; Green, D.V.S.; Leach, A.R. integration. Drug Discov. Today, 2000, 5(12), 61-69.
Prediction of biological activity for high-throughput screening [49] Poulain, R.F.; Horvath, D.; Bonnet, B.; Eckhoff, C.; Chapelain, B.;
using binary kernel discrimination. J. Chem. Inf. Comput. Sci., Bodinier, M-C.; Deprez, B. From hit to lead. Analyzing structure-
2001, 41(5), 1295-1300. profile relationships J. Med. Chem., 2001, 44(21), 3391-3401.
[33] Zernov, V.Z.; Balakin, K.V.; Ivaschenko, A.A.; Savchuk, N.P.; [50] Manallack, D.T.; Pitt, W.R.; Gancia, E.; Montana, J.G.;
Pletnev, I.P. Drug discovery using support vector machines. The Livingstone, D.J.; Ford, M.G.; Whitley, D.C. Selecting screening
case studies of drug-likeness, agrochemical-likeness, and enzyme candidates for kinase and G protein-coupled receptor targets using
inhibition predictions. J. Chem. Inf. Comput. Sci., 2003, 43(6), neural networks. J. Chem. Inf. Comput. Sci., 2002, 42(5), 1256-62.
2048-2056. [51] Schneider, G.; Nettekoven, M. Ligand-based combinatorial design
[34] Ivanciuc, O. Applications of support vector machines in chemistry. of selective purinergic receptor (A2A) antagonists using self-
Rev. Comput. Chem., 2007, 23, 291-400. organizing maps. J. Comb. Chem., 2003, 5(3), 233-237.
[35] Cheng, B.; Titterington, D.M. Neural networks: A review from a [52] Bradley, E.K.; Miller, J.L.; Saiah, E.; Grootenhuis, D.J.
statistical perspective. Stat. Sci., 1994, 9(1), 2-30. Informative library design as an efficient strategy to identify and
[36] Doman, T.N.; Mcgovern, S.L.; Witherbee, B.J.; Kasten, T.P.; optimize leads: Application to cyclin-dependent kinase 2
Kurumbail, R.; Stallings, W.C.; Connolly, D.T.; Shoichet, B.K. antagonists. J. Med. Chem., 2003, 46(20), 4360-4364.
Molecular docking and high-throughput screening for novel [53] Grüneberg, S.; Stubbs, M.T.; Klebe, G. Successful virtual
inhibitors of protein tyrosine phosphatase-1B. J. Med. Chem., 2002, screening for novel inhibitors of human carbonic anhydrase:
45(11), 2213-2221. Strategy and experimental confirmation. J. Med. Chem., 2002,
[37] Polgar, T.; Baki, A.; Szendrei, G.I.; Keseru G.M. Comparative 45(17), 3588-3602.
virtual and experimental high-throughput screening for glycogen [54] Spencer, R.W. Diversity analysis in high throughput screening. J.
synthase kinase-3 inhibitors. J. Med. Chem., 2005, 48(25), 7946- Biomol. Screen, 1997, 2(2), 69-70.
7959.

Received: September 3, 2010 Revised: February 4, 2011 Accepted: June 11, 2011

You might also like