Professional Documents
Culture Documents
The Power of Deep Learning To Ligand-Based Novel Drug Discovery
The Power of Deep Learning To Ligand-Based Novel Drug Discovery
Igor I. Baskin
To cite this article: Igor I. Baskin (2020): The power of deep learning to ligand-based novel drug
discovery, Expert Opinion on Drug Discovery, DOI: 10.1080/17460441.2020.1745183
Article views: 9
PERSPECTIVE
CONTACT Igor I. Baskin igbaskin@gmail.com Faculty of Physics, M.V. Lomonosov Moscow State University, Moscow, Russia
© 2020 Informa UK Limited, trading as Taylor & Francis Group
2 I. I. BASKIN
Figure 1. A deep multilayer neural network applied to the prediction of three target properties of chemical compounds.
EXPERT OPINION ON DRUG DISCOVERY 3
models for rodent tissues with a large amount of experimental biology, and the weights of the connections going to them
data. (called filters, by analogy with image processing) are common
The advantage of combining multitask learning with deep to (shared among) all neurons of the convolutional layer. The
learning is the possibility of forming a deep hierarchy of purpose of the convolutional layer is to transform local
general internal representations (descriptors), which provides descriptors to take into account the influence of the nearest
more opportunities for ‘induction transfer’ between tasks, neighborhood (for example, the influence of the nearest
thereby increasing the predictive performance. In one of the atoms in a molecule) to increase the predictive performance
first publications on the use of deep learning in the field of of the neural network. The pooling layers consist of neurons
drug discovery, Ma et al. have applied multitask learning to with fixed weights that generalize the local descriptors formed
build QSAR model and found that the increase in predictive in the convolutional layers and turn them into a set of a fixed
performance is due to ‘borrowing’ information on related number of global descriptors. Further, the global molecular
properties from similar compounds in other datasets [35]. descriptors formed in the last pooling layer can enter the
This allowed the authors to develop a strategy for construct- input of DNN with dense hidden layers. Thus, due to the
ing multi-tasking deep neural networks in which cooperation automatic formation of local and then global molecular
between different tasks would lead to an improvement in descriptors in hidden layers, CNNs can build direct correlations
predictive ability. Recent advances in the use of multitasking between ‘raw’ descriptions of chemical structures and their
training of deep neural networks in chemoinformatics and target properties. For the first time, the ability to perform
drug discovery have been surveyed by Sosnin et al. [36]. direct structure-property correlations using similar principles
was demonstrated in the 90 s using ‘a neural device for
searching direct correlations between structures and proper-
2.3. Convolutional neural networks (CNNs)
ties of chemical compounds’ [37].
The deep multilayer neural networks discussed above use There exist several types of CNNs depending on the type of
their hidden layers to transform a fixed number of originally ‘raw’ representation of molecules. The most popular among
defined global molecular descriptors into their non-linear them are 1D-, 2D-, 3D-, and graph-CNNs. 1D-CNNs use SMILES
combinations, more suitable for predicting the target proper- [38] as a ‘raw’ representation and ‘one-hot’ encoding of
ties of chemical compounds. In this case, global molecular a character in each position of the string as initial local
descriptors are understood as numbers characterizing the descriptors. A ‘one-hot’ encoded character is a vector contain-
entire chemical structure as a whole and independent of the ing 1 in the position corresponding to its type and 0 else-
numbering of atoms in the molecule, as well as the movement where. As an example, this kind of network was used by Kwon
and rotation of the molecule in space. The success of building and Yoon in the DeepCCL method to predict chemical-
models with such descriptors is largely determined by how chemical interaction directly from SMILES strings [39]. 2D-
brilliant the scientists who developed such descriptors were CNNs use graphic images of structural formulas as a ‘raw’
and how successfully a set of descriptors was selected to build representation of chemical structures and color codes of indi-
a specific structure-property model. CNNs go much further vidual pixels as initial local descriptors. So, one can produce
and eliminate the subjective human factor by forming global images for the structural formula of all chemical compounds in
molecular descriptors directly from a ‘raw’ description of che- the dataset and use powerful image recognition software to
mical structures, such as molecular graphs, see below. In build predictive models for target properties. This methodol-
numerical form, such a ‘raw’ description can be represented ogy has been successfully applied in several publications
by either a variable or a fixed number of local descriptors (e.g. [40,41] and led to the discovery of CDK4 inhibitors in ligand-
indicators of atom types or local charges on atoms), the values based virtual screening [42]. 3D-CNNs use molecular fields
of which depend on the permutations of atoms in the mole- (both ‘real’ physical fields, such as electrostatic potential, and
cules or on the movement and rotation of the molecule in pseudo-fields indicating the presence of certain atom types at
space. CNNs transform such local descriptors into global ones given location in fuzzy manner) as a ‘raw’ representation of
using special convolutional and pooling layers of hidden neu- chemical structures and the values of such molecular fields
rons (Figure 2). In contrast to the dense hidden layers consid- computed for individual voxels (3D analogs of pixels) as initial
ered above, neurons of a convolutional layer receive signals local descriptors. Although most of the work on the use of this
only from a small portion of the previous layer, called the type of neural network involves the consideration of protein-
receptor field, by analogy with the mechanism of vision in ligand complexes, this network architecture can successfully
Figure 2. A graph convolutional neural network applied to the prediction of three target properties of chemical compounds.
4 I. I. BASKIN
be applied to build 3D-QSAR models [43,44]. Graph-CNNs use new chemical structures. This can be performed because, in
molecular graphs as a ‘raw’ representation of chemical struc- addition to the input and hidden layers in RNNs, there is also
tures, properties of atoms as initial local descriptors, and close an output layer with the softmax activation function that
neighborhood of atoms in molecules as ‘receptive fields’ for calculates the probability of each type for the next symbol.
neurons in convolutional layers. Currently, deep neural net- So, in the generator mode, this probability distribution can be
works with such an architecture are actively used to automa- sampled, and the ‘one-hot encoded’ sampled character can be
tically generate global descriptors for predicting various used as an input to the network in order to generate one
properties of chemical compounds [45–49]. In addition, graph- more character, etc. The growth of the current SMILES string
CNNs are currently used as blocks in neural networks with can be terminated upon the generation of a special stop-
more complex architectures, including mechanisms that symbol. In this way, new chemical structures that can further
allow one to focus ‘attention’ on individual molecules or be used in virtual screening can be generated [56]. In order to
their parts [50]. In particular, they are part of a special archi- direct the neural network to generate structures with desired
tecture of neural networks, which allows building classification properties, several approaches have been developed. One of
models for predicting the type of biological activity in the them is based on so-called transfer learning in which
presence of a very small number of examples within the a network is first trained on a large data set to generate
framework of the so-called ‘one-shot learning’ [51], see discus- correct SMILES and then fine-tuned to specific ligand subsets
sion in Ref [52]. to generate active structures for specific biological targets [57].
RNNs work with molecules represented as sequences, e.g. Another approach to generate structures with desired properties
sequences of characters in SMILES strings for small molecules is to use reinforcement learning (RL), which is a general framework
(Figure 3). RNNs iteratively, starting from the first element, apply in machine learning dealing with how software ‘agents’ learn to
the same recurrent procedure to all elements of the sequence, take actions in an ‘environment’ so as to maximize reward (or
while forming in its hidden layers descriptors for its first minimize penalty) [58]. RL is used in AI to solve dynamic decision
t elements, dt, by combining already computed descriptors for problems, so as to take such actions that would lead to desirable
the first t-1 elements, dt−1, with initial local descriptors for the t-th outcome. RL powered by deep neural networks implementing its
element, it. This is provided by special recurrent neurons, in which functions is called deep RL. In application to the generation of
the outputs change with discrete time t, and a special loop with chemical structures, RL ‘rewards’ for generating structures with
time delay connects the neuron at time t-1 with itself at time t. In desired target properties and ‘punishes’ for undesired properties,
order to solve some problems associated with the functioning of like in the method of performing de novo design developed by
RNNs, instead of simple recurrent neurons, it is customary to use Popova et al. [59]. In the case of generating SMILES by an RNN,
special microarchitectures, particularly LSTM (Long Short-Term the network is rewarded during the training whenever
Memory) [53] and GRU (Gated Recurrent Units) [54]. a completely generated SMILES corresponds to a correct chemi-
In the case of SMILES, the initial local descriptors for the t-th cal structure possessing desirable properties. So, due to the use
element is a ‘one-hot’ encoded type of character at the t-th of deep RL, RNNs learn how to produce each next character in
position of the string. After passing through the whole SMILES SMILES in order to be able to generate chemical structures with
string a vector of global molecular descriptors is formed for the desired properties. Deep RL is currently very actively used in drug
corresponding chemical structure. In this case, an RNN can be discovery in conjunction with various approaches to generate
trained to predict each t-th character from the descriptors dt−1 chemical structures [58–64]. Some of them will be considered
computed for the previous substring, using a large set of correct below.
SMILES. Descriptor vectors computed for the whole SMILES
strings can be used for building models to predict chemical
2.6. Autoencoders
properties, like in the SMILES2vec approach [55].
The ability of a trained RNN to predict the next character in An autoencoder is a neural network containing two subnetworks,
a partially formed SMILES can efficiently be used to generate an encoder and a decoder [28], see Figure 4. The encoder transforms
Figure 3. A recurrent neural network applied to the prediction (in training mode) or the generation (in generator mode) of the SMILES string for ethanol.
EXPERT OPINION ON DRUG DISCOVERY 5
Figure 4. An autoencoder transforming a chemical structure to a code vector in the autoencoder latent space followed by the reconstruction of the chemical
structure from it.
an initial representation of a molecule to a set of descriptors, called comparison with ordinary autoencoders. This allows drawing
code vector, from which the decoder reconstructs the initial repre- continuous trajectories in the latent space, from each point of
sentation. The code vectors form the latent space of the autoenco- which molecules can be reconstructed. This allowed Gomez-
der. The training of the autoencoder is aimed at reducing the Bombarelli et al. to propose a method for optimizing chemical
reconstruction error with the smallest possible number of code structures in order to improve the desired properties [71]. For
descriptors. If such autoencoder is trained using a set of com- this purpose, an additional model can be built to predict the
pounds with a certain type of biological activity, then the recon- target properties from the values of latent variables (code
struction error can be used as a scoring function in virtual vectors), so optimization of the property in the latent space
screening to discover novel drugs belonging to the same activity results in the optimization of discrete chemical structures. An
class [65,66]. The use of deep learning enables autoencoders to important modification of VAEs are conditional VAEs (CVAEs)
work with ‘raw’ representations of chemical structures. In particu- where vectors of target properties are injected into the auto-
lar, chemical structures represented using SMILES strings can be encoder, so it is possible to generate chemical structures with
processed using a sequence-to-sequence architecture in which a set of desired target properties without the need to perform
RNNs are used both for encoding the SMILES strings to code optimization in the latent space [72]. Another way to perform
vectors and decoding them back to the SMILES strings [67]. In the similar conditional molecular design of compounds with
this case, the goal of the training is to correctly restore SMILES desired target properties is to use a semi-supervised modifica-
strings from their code vectors. Xu et al. have shown that such code tion of VAEs (SSVAEs) trained on a set of partially annotated
vectors, called ‘seq2seq fingerprints’, could be useful for drug chemical compounds, so new molecules with desired proper-
discovery, because chemical structures can be reconstructed ties can be generated by sampling from the conditional dis-
from them using the decoder [68]. tribution without any extra optimization [73].
Several approaches to the generation of chemical structures Zhavoronkov et al. used VAE with a rich prior distribution
with desired properties using autoencoder code vectors have in the latent space in combination with tensor decomposi-
been proposed. All of them are based on (1) approximation of tion and RL based on reward functions approximated by
the statistical distribution of molecules in the latent space self-organizing maps (SOMs) to discover novel potent inhi-
defined by code vectors, (2) sampling of new points from this bitors of discoidin domain receptor 1 in 21 days [74], see
distribution, (3) restoration of chemical structures from them Figure 5.
using the decoder. Sattarov et al. approximated molecule dis-
tribution using the GTM (Generative Topographic Mapping)
2.7. Adversarial learning
machine learning method, used the ‘activity landscape’ metho-
dology to select zones enriched with molecules with the desired The disadvantage of VAEs is their tight attachment to the
activity and generated new molecules by feeding the points Gaussian distribution used as a prior, while the distribution
sampled in the latent space from such zones to the decoder [69]. of chemical compounds in databases is often far from
Several published approaches to generation of chemical Gaussian and has many modes. Such a mismatch can lead to
structures with desired properties are based on the use of the appearance of large ‘holes’ in some zones in the latent
variational autoencoders (VAE) – a special modification of auto- space, which may make it poorly suitable for optimizing che-
encoders in which a molecule is mapped by the encoder not mical structures. An effective approach to solving this problem
to a single code vectors but to a Gaussian distribution over the is to use adversarial learning, which underlies generative adver-
latent space containing all possible code vectors [70]. During sarial nets (GANs) [75]. GANs consist of two networks,
the training, the decoder learns to reconstruct the original a generator, which learns to generate new objects, and dis-
molecule from a random code vector sampled from this dis- criminator, which learns to discriminates between real and
tribution. Since any point close to the original distribution of generated objects. The adversarial learning is like a game in
molecules in the latent space can be sampled, VAE creates its which the generator learns to ‘fool’ the discriminators which
latent space with much fewer ‘holes’ (zones in the latent space also learns not to be ‘fooled’, and in the result, the generator
from which correct molecules cannot be reconstructed) in becomes so ‘smart’ that the generated objects become
6 I. I. BASKIN
Figure 5. Novel potent inhibitors of human DDR1 kinase found by DL-driven de novo drug design.
indistinguishable from the real ones. Adversarial autoencoders 2.8. Graph-based generation of chemical structures
(AAEs) combine autoencoders with adversarial learning. In this
The above-discussed SMILES-based generation of chemical
case, an additional discriminator neural network learns to
structures has, however, serious limitations. First, the genera-
discriminate between the code vectors produced by the enco-
tor should know or learn from data the grammar of SMILES in
der and random vectors sampled from the user-defined prior
order to keep all parentheses balanced and all ring closures
distribution, while the encoder learns to ‘fool’ the discrimina-
paired. Second, multiple SMILES can be assigned to the same
tor and generate code vectors in accordance with the speci-
chemical structure, while rather dissimilar canonical SMILES
fied prior distribution. For the first time, AAEs were applied by
can correspond to very similar molecules, leading to bad
Kadurin et al. to generate molecular fingerprints, which can
neighborhood behavior [81]. To address these problems, it
further be used for discovering new drugs through virtual
has been suggested to use deep learning algorithms to gen-
screening [76]. Blaschke et al. have applied AAEs for producing
erate chemical structures as molecular graphs directly, without
the latent space for performing optimization of chemical
the use of SMILES. In this case, VAE, GAN and RL frameworks
structures to perform de novo molecular design using the
can be combined with graph-CNNs. Only a few examples from
Bayesian optimization algorithm [77]. They have demonstrated
this fast-growing research domain are given here. Simonovsky
the advantage of using AAEs with the uniform prior distribu-
et al. [82], Jin et al. [81], and Samanta et al. [83] have devel-
tion over the standard VAEs. The above-mentioned Bayesian
oped special graph-based VAEs, whereas De Cao and Kipf [84]
optimization algorithm applied to optimization of molecules
have developed a graph-based GAN. Popova et al. have devel-
works by maximizing the probability of being active against
oped autoregressive MolecularRNN, a graph recurrent genera-
a specific target predicted by Gaussian processes [78] models.
tive model for generating chemical structures with desirable
Such models are first trained with a set of initial points in the
properties, which is trained using RL [85].
autoencoder latent space, then in each iteration a new point
A common drawback of graph-based methods for generating
selected by maximizing a certain probabilistic criterion based
chemical structures is significantly lower computational effi-
on the current model is added to the set, the model is rebuilt
ciency and significantly greater memory requirements compared
and so on. Prykhodko et al. have developed the LatentGAN
to the SMILES-based methods. This, in turn, imposes certain
architecture which also combines an autoencoder with a GAN
restrictions on the size of molecular graphs. An effective solution
for de novo molecular design [79]. Guimaraes et al. have
to this and several other problems has recently been proposed
combined SMILES-based structure generation with both the
using a very promising approach – graph normalizing flows.
adversarial and RL into the ORGAN (Objective-Reinforced
A normalizing flow is a transformation of a simple probability
GANs) network, in which the generator is trained to maximize
distribution (like a Gaussian distribution in a latent space) into
two rewards, the first one that improves the activity of mole-
a probability distribution of any complexity (like distribution of
cules, and the second one that learns to mimic real molecules
molecules in chemical space) by a sequence of invertible and
by ‘fooling’ the discriminator [80].
differentiable mappings [86,87]. Unlike VAEs and GANs, they can
EXPERT OPINION ON DRUG DISCOVERY 7
approximate data distributions exactly and provide from them generative models in terms of diversity, different kinds of similarity,
sampling very efficiently due to the invertibility of transforma- Fréchet ChemNet Distance [98], synthetic accessibility, drug-
tions. Since normalizing flows can operate both in continuous likeness, etc.
and discrete domains, they have been adopted to generate
graphs. This has recently led to the development of three novel
3. Conclusions
methods for generating molecular graphs: GraphNVP [88],
GRevNets [89] and GraphAF [90]. Deep learning provides very powerful tools that can be used
in ligand-based novel drug discovery both to conduct virtual
screening of the already prepared libraries of chemical com-
2.9. Generation of chemical structures using alternative
pounds and to generate new compounds with desired proper-
string representations
ties. The deep architecture of neural networks with multiple
Despite the above issues, using SMILES to generate chemical hidden layers makes an internal representation of molecules
structures, however, has some clear advantages, including more suitable for predicting target properties, as well as for
simplicity, versatility, and the ability to use a rich arsenal of transferring knowledge between different tasks. CNNs can
methodologies and software tools designed for text mining. work directly with chemical structures without the use of
Several approaches were proposed to overcome the draw- descriptor sets previously invented by humans. Using RNNs,
backs of using SMILES, while still preserving the benefits of one can apply text processing methods to chemical structures
using strings. First, it has been shown that the problem of represented using SMILES strings. Chemical structures can be
poor neighborhood behavior for canonical SMILES can be generated either through SMILES strings or directly working
overcome by using sets of different non-canonical SMILES with molecular graphs. Autoencoders can be used to produce
strings [91]. Second, several alternative string representations ‘invertible’ descriptors from which chemical structures can be
addressing the grammar-related problem have been sug- recovered. This allows one to effectively design new com-
gested. Kusner et al. have developed the GrammarVAE pounds, both by generating new chemical structures for
approach based on representing SMILES as a grammar parse a given property and by optimizing chemical structures in
tree, so all SMILES stings generated using VAEs (see above) are the latent space of autoencoders. Different types of VAEs,
guaranteed to have valid syntax [92]. To achieve the same GANs, and RL are powerful tools that can be very useful for
goal, O’Boyle and Dalke have developed DeepSMILES, ligand-based drug discovery. Several success stories that have
a modification of SMILES which avoids the problems of unba- already been reported in the literature.
lanced parentheses and pairing ring closure symbols [93].
Krenn et al. have developed the SELFIES string representation
4. Expert opinion
based on a Chomsky type-2 grammar that can be used to
represent graphs so that random modifications in them would Despite very promising results obtained by using deep learn-
lead to correct semantically-constrained graphs [94]. ing in ligand-based drug discovery, it is necessary to note
some still challenging problems. The first one is the issue of
interpretability, explainability, and integration with deep
2.10. Mutation-based generation of chemical structures
scientific knowledge. The discovery of drugs is a very long
A reasonable alternative to the neural-network generation of and expensive process, in which many specialists from various
new chemical structures in de novo drug design is a more fields are involved, who should well understand the decisions
traditional approach, which consists in modifying molecules made by colleagues. The process of drug design powered by
using chemical ‘mutations’. In the past year, there has been AI-based on deep learning algorithms should not be per-
a trend in combining nutation-based structure generation ceived as a sort of magic that can only be blindly believed
with deep learning. For example, Zhou et al. have combined but not completely trusted. Unfortunately, this is not facili-
optimization of molecules using a small set of chemical ‘muta- tated by the ‘black box’ nature of deep neural networks,
tions’ with deep RL in order to generate new molecules with whose work is still not fully understood by mathematicians
desired properties [64]. Nigam et al. have combined the and even their creators. In the field of drug discovery, the
genetic algorithm (GA) based on mutations defined on human-AI interaction is very important, and the abbreviation
SELFIES strings [94] with a DNN-based discriminator model AI should mean augmented intelligence rather than artificial
and have shown that this approach ‘outperforms other gen- intelligence. So, AI should be rather a partner of medicinal
erative models in optimization tasks’ [95]. chemists than their replacement. The interpretation of the
neural network model allows a better understanding of the
nature of the phenomenon described by it [99]. Interactive
2.11. Benchmarking generation of chemical structures
visualization tools, like those based on GTM landscapes, are
In order to compare different generative models for de novo useful for involving humans in the generation of chemical
molecular design, two benchmarking platforms have been devel- structures by neural networks [69]. A positive trend is the
oped, GuacaMol by Brown et al. [96] and MOSES (Molecular SEtS) explicit consideration of their spatial structure in 3D space in
by Polykovskiy et al. [97]. Both contain standard molecular bench- the process of generating molecules because this is an impor-
mark datasets, implementations of several algorithms for produ- tant factor determining the biological activity of drugs [100–
cing generative models and generating chemical structures, and 103]. Empowering a neural network with the ability to cor-
several metrics for comparing the performance of different rectly consider the fundamental relationships between
8 I. I. BASKIN
different target properties [104], as well as the correct sym- 3. Baskin II, Palyulin VA, Zefirov NS. Neural networks in building QSAR
metry properties [103,105], is also important for taking into models. Methods Mol Biol. 2008;458:137–158.
4. Baskin II, Winkler D, Tetko IV. A renaissance of neural networks in
account the laws of the physical world.
drug discovery. Expert Opin Drug Discov. 2016;11(8):785–795.
Another very important challenging problem is the synthetic 5. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521
accessibility of chemical compounds in de novo drug design. Ertl (7553):436–444. .
and Schuffenhauer have developed a synthetic accessibility score •• The most cited review on deep learning
based on molecular complexity and fragment contributions [106], 6. Varnek A, Baskin I. Machine learning methods for property predic-
tion in chemoinformatics: quo vadis? J Chem Inf Mod. 2012;52(6):
and it is a common practice to use it as one of the objectives in the
1413–1437. .
process of chemical structure generation and optimization. Coley • A comprehensive review on the use of machine learning for
et al. have developed for the same purpose a synthetic complexity property prediction in chemoinformatics
score, SCScore, learned from a huge chemical reaction database 7. Gawehn E, Hiss JA, Schneider G. deep learning in drug discovery.
[107]. Although the use of such scores leads to the effective rejec- Mol Inf. 2016;35(1):3–14.
8. Ekins S. The next era: deep learning in pharmaceutical research.
tion of difficult to synthesize compounds, this problem cannot be
Pharmaceut Res. 2016;33(11):2594–2603.
solved without a detailed retrosynthetic analysis with an exact 9. Carpenter KA, Cohen DS, Jarrell JT, et al. Deep learning and virtual
indication of the structures of the reactants and reagents, along drug screening. Future Med Chem. 2018;10(21):2557–2567.
with the conditions (the temperature, solvent, catalysts, etc.) and 10. Chen H, Engkvist O, Wang Y, et al. The rise of deep learning in drug
experimental procedure for each of the stages of the synthesis. The discovery. Drug Discov Today. 2018;23(6):1241–1250.
11. Jørgensen PB, Schmidt MN, Winther O. Deep generative models for
task of planning and conducting organic synthesis is usually the
molecular science. Mol Inf. 2018;37(1–2):1700133.
responsibility of a synthetic chemist, which is not acceptable when 12. Bajorath J. Data analytics and deep learning in medicinal chemistry.
working with a large number of compounds, as well as when Future Med Chem. 2018;10(13):1541–1543.
conducting synthesis using autonomous robotic synthetic devices. 13. Tang W, Chen J, Wang Z, et al. Deep learning for predicting toxicity
It is important to note that in recent years, AI methods based on of chemicals: a mini review. J Environ Sci Health Part C. 2018;36
(4):252–271.
deep learning have also taken a decisive role in computer planning
14. Sanchez-Lengeling B, Aspuru-Guzik A. Inverse molecular design
of organic synthesis [108–111], so there is a fundamental possibility using machine learning: generative models for matter
of embedding retrosynthetic analysis in the generation of chemical engineering. Science. 2018;361(6400):360–365.
structures with desired properties. The first example of the imple- 15. Ghasemi F, Mehridehnavi A, Pérez-Garrido A, et al. Neural network
mentation of such an approach that allows for the generation of and deep-learning algorithms used in QSAR studies: merits and
drawbacks. Drug Discov Today. 2018;23(10):1784–1790.
synthetically accessible molecules along with a complete specifi-
16. Jing Y, Bian Y, Hu Z, et al. Deep learning for drug design: an
cation of their synthesis is the DINGOS system recently developed artificial intelligence paradigm for drug discovery in the big data
by Button et al. [112], which combines the ligand-similarity-based era. Aaps J. 2018;20(3):58.
generation of chemical structures with a machine learning model 17. Rifaioglu AS, Atas H, Martin MJ, et al. Recent applications of deep
trained on successful synthetic roots. learning and machine intelligence on in silico drug discovery:
methods, tools and databases. Brief Bioinform. 2018;20(5):1878–
1912.
Funding 18. Sellwood MA, Ahmed M, Segler MH, et al. Artificial intelligence in
drug discovery. Future Med Chem. 2018;10(17):2025–2028.
I Baskin is funded by the Ministry of Education, Youth and Sports of the 19. Zhavoronkov A. Artificial intelligence for drug discovery, biomarker
Czech Republic (agreement MSMT-5727/2018-2) and the Ministry of development, and generation of novel chemistry. Mol Pharm.
Higher Education and Science of the Russian Federation (project number: 2018;15(10):4311–4313.
RFMEFI58718X0049). 20. Elton DC, Boukouvalas Z, Fuge MD, et al. Deep learning for mole-
cular design—a review of the state of the art. Mol Syst Design Eng.
2019;4(4):828–849.
Declaration of Interest 21. Schneider P, Walters WP, Plowright AT, et al. Rethinking drug
design in the artificial intelligence era. Nat Rev Drug Discovery.
The authors have no other relevant affiliations or financial involvement 2019. DOI:10.1038/s41573-019-0050-3.
with any organization or entity with a financial interest in or financial 22. Xue D, Gong Y, Yang Z, et al. Advances and challenges in deep
conflict with the subject matter or materials discussed in the manuscript generative models for de novo molecule generation. Wiley
apart from those disclosed. Interdiscip Rev Comput Mol Sci. 2019;9(3):e1395.
23. Xu Y, Lin K, Wang S, et al. Deep learning for molecular generation.
Future Med Chem. 2019;11(6):567–597. .
Reviewer Disclosures 24. Lake F. Artificial intelligence in drug discovery: what is new, and
what is next? Fut Drug Discov. 2019;1(2):FDD19.
Peer reviewers on this manuscript have no relevant financial or other
25. Walters WP, Stahl MT, Murcko MA. Virtual screening - an overview.
relationships to disclose.
Drug Discov Today. 1998;3(4):160–178.
26. Green DV. Virtual screening of virtual libraries. Prog Med Chem.
2003;41:61–97.
References 27. Ripphausen P, Nisius B, Bajorath J. State-of-the-art in ligand-based
virtual screening. Drug Discov Today. 2011;16(9–10):372–376.
Papers of special note have been highlighted as either of interest (•) or of
28. Bengio Y, Courville A, Vincent P. Representation learning: A review and new
considerable interest (••) to readers.
perspectives. IEEE Trans Pattern Analys Mach Inte. 2013;35(8):1798–1828.
1. Gasteiger J, Zupan J. Neural networks in chemistry. Angew Chem
29. Dahl GE, Jaitly N, Salakhutdinov R. Multi-task neural networks for
Int Ed Engl. 1993;105(4):503–527.
QSAR predictions. arXiv preprint arXiv:14061231. 2014.
2. Halberstam NM, Baskin II, Palyulin VA, et al. Neural networks as
a method for elucidating structure-property relationships for 30. Mayr A, Klambauer G, Unterthiner T, et al. DeepTox: toxicity prediction using
organic compounds. Russ Chem Rev. 2003;72(7):629–649. deep learning. Front Environ Sci. 2016;3(80). DOI:10.3389/fenvs.2015.00080.
EXPERT OPINION ON DRUG DISCOVERY 9
31. Lenselink EB, Ten Dijke N, Bongers B, et al. Beyond the hype: deep 53. Hochreiter S, Schmidhuber J. Long short-term memory. Neural
neural networks outperform established methods using a ChEMBL Comput. 1997;9(8):1735–1780. .
bioactivity benchmark set. J Cheminf. 2017;9(1):45. • The most popular architecture of recurrent neural networks
32. Mayr A, Klambauer G, Unterthiner T, et al. Large-scale comparison 54. Chung J, Gülçehre Ç, Cho K, et al. Empirical evaluation of gated
of machine learning methods for drug target prediction on recurrent neural networks on sequence modeling. ArXiv Preprint,
ChEMBL. Chem Sci. 2018;9(24):5441–5451. 2014;arXiv:14123555..
33. Caruana R. Multitask Learning. Mach Learn. 1997;28(1):41–75. 55. Goh GB, Hodas NO, Siegel C, et al. Smiles2vec: an interpretable
34. Varnek A, Gaudin C, Marcou G, et al. Inductive transfer of knowl- general-purpose deep neural network for predicting chemical
edge: application of multi-task learning and feature net approaches properties. ArXiv Preprint, 2017;arXiv:1712.02034.
to model tissue-air partition coefficients. J Chem Inf Mod. 2009;49 56. Ertl P, Lewis R, Martin E, et al. In silico generation of novel, drug-like
(1): 133–144. . chemical matter using the LSTM neural network. ArXiv Preprint,
• The first use of multitask learning in chemoinformatics 2017;arXiv:171207449.
35. Xu Y, Ma J, Liaw A, et al. Demystifying multitask deep neural 57. Anvita GT, MA H, HBJ A, et al. Generative recurrent networks for de
networks for quantitative structure–activity relationships. J Chem novo drug design. Mol Inf. 2018;37(1–2):1700111.
Inf Mod. 2017;57(10):2490–2504. 58. Sutton RS, Barto AG. Reinforcement learning: an introduction.
36. Sosnin S, Vashurina M, Withnall M, et al. A survey of multi-task Cambridge ed. Cambridge: MIT Press; 1998.
learning methods in chemoinformatics. Mol Inf. 2019;38 59. Popova M, Isayev O, Tropsha A. Deep reinforcement learning for de
(4):1800108. novo drug design. Sci Adv. 2018;4(7):eaap7885.
37. Baskin II, Palyulin VA, Zefirov NS. A neural device for searching 60. Olivecrona M, Blaschke T, Engkvist O, et al. Molecular de-novo
direct correlations between structures and properties of chemical design through deep reinforcement learning. J Cheminf. 2017;9
compounds. J Chem Inf Comput Sci. 1997;37(4):715–721. (1):48.
• First neural network with convolutional architecture for 61. Popova M, Isayev O, Tropsha A Deep reinforcement learning for
searching direct correlations between structures and proper- de-novo drug design; 2017.
ties of chemical compounds 62. Neil D, Segler M, Guasch L, et al. Exploring deep recurrent models
38. Weininger D. SMILES, a chemical language and information system. with reinforcement learning for molecule design. 2018.
1. introduction to methodology and encoding rules. J Chem Inf 63. Putin E, Asadulaev A, Ivanenkov Y, et al. Reinforced adversarial
Comput Sci. 1988;28(1):31–36. neural computer for de novo molecular design. J Chem Inf Mod.
•• The most popular string representation of chemical structures 2018;58(6):1194–1204.
39. Kwon S, Yoon S. End-to-End Representation Learning for 64. Zhou Z, Kearnes S, Li L, et al. Optimization of molecules via deep
Chemical-Chemical Interaction Prediction. IEEE/ACM Trans reinforcement learning. Sci Rep. 2019;9(1):10752.
Comput Biol Bioinform. 2019;16(5):1436–1447. 65. Karpov PV, Osolodkin DI, Baskin II, et al. One-class classification as
40. Goh GB, Siegel C, Vishnu A, et al. Chemception: a deep neural a novel method of ligand-based virtual screening: the case of
network with minimal chemistry knowledge matches the perfor- glycogen synthase kinase 3ОІ inhibitors. Bioorg Med Chem Lett.
mance of expert-developed QSAR/QSPR models. ArXiv Preprint, 2011;21(22):6728–6731.
2017;arXiv:170606689.. 66. Zhokhova NI, Baskin II. Energy-based neural networks as a tool for
41. Fernandez M, Ban F, Woo G, et al. Toxic colors: the use of deep harmony-based virtual screening. Mol Inf. 2017;36(11):1700054.
learning for predicting toxicity of compounds merely from their 67. Sutskever I, Vinyals O, Le QV Sequence to sequence learning with
graphic images. J Chem Inf Mod. 2018;58(8):1533–1543. neural networks. Proceedings of the 27th International Conference
42. Xu Y, Chen P, Lin X, et al. Discovery of CDK4 inhibitors by convolu- on Neural Information Processing Systems - Volume 2; Montreal,
tional neural networks. Future Med Chem. 2019;11(3):165–177. Canada. 2969173: MIT Press; 2014. p. 3104–3112.
43. Sosnin S, Misin M, Palmer DS, et al. 3D matters! 3D-RISM and 3D 68. Xu Z, Wang S, Zhu F, et al. Seq2seq fingerprint: an unsupervised deep
convolutional neural network for accurate bioaccumulation molecular embedding for drug discovery. Proceedings of the 8th ACM
prediction. J Phys. 2018;30(32):32LT03. International Conference on Bioinformatics, Computational Biology,
44. Kuzminykh D, Polykovskiy D, Kadurin A, et al. 3D molecular repre- and Health Informatics; Boston, Massachusetts, USA. 3107424: ACM;
sentations based on the wave transform for convolutional neural 2017. p. 285–294.
networks. Mol Pharm. 2018;15(10):4378–4385. 69. Sattarov B, Baskin II, Horvath D, et al. De novo molecular design
45. Duvenaud DK, Maclaurin D, Iparraguirre J, et al. Convolutional by combining deep autoencoder recurrent neural networks with
networks on graphs for learning molecular fingerprints. Advances generative topographic mapping. J Chem Inf Mod. 2019;59
in Neural Information Processing Systems 28 (NIPS 2015). Montreal, (3):1182–1196.
Canada; 2015, p. 2215–2223. 70. Kingma DP, Welling M. Auto-encoding variational bayes. ArXiv
46. Kearnes S, McCloskey K, Berndl M, et al. Molecular graph convolu- Preprint. 2014;arXiv:13126114.
tions: moving beyond fingerprints. J Comput-Aided Mol Des. • The first publication on variational autoencoders.
2016;30(8):595–608. 71. Gómez-Bombarelli R, Wei JN, Duvenaud D, et al. Automatic chemi-
47. Coley CW, Barzilay R, Green WH, et al. Convolutional embedding of cal design using a data-driven continuous representation of
attributed molecular graphs for physical property prediction. molecules. ACS Cent Sci. 2018;4(2):268–276. .
J Chem Inf Mod. 2017;57(8):1757–1772. 72. Lim J, Ryu S, Kim JW, et al. Molecular generative model based on
48. Xu Y, Pei J, Lai L. Deep learning based regression and multiclass conditional variational autoencoder for de novo molecular design.
models for acute oral toxicity prediction with automatic chemical J Cheminf. 2018;10(1):31.
feature extraction. J Chem Inf Mod. 2017;57(11):2672–2685. 73. Kang S, Cho K. Conditional molecular design with deep generative
49. Ståhl N, Falkman G, Karlsson A, et al. Deep convolutional neural networks models. J Chem Inf Mod. 2019;59(1):43–52.
for the prediction of molecular properties: challenges and opportunities 74. Zhavoronkov A, Ivanenkov YA, Aliper A, et al. Deep learning
connected to the data. J Integr Bioinform. 2018;16(1):20180065. enables rapid identification of potent DDR1 kinase inhibitors. Nat
50. Michael W, Edvard L, Ola E, et al. Building attention and edge Biotechnol. 2019;37(9):1038–1040. .
convolution neural networks for bioactivity and physical-chemical 75. Goodfellow IJ, Pouget-Abadie J, Mirza M, et al. Generative adver-
property prediction; 2019. sarial nets. Arxiv Preprint. 2014;arXiv:14062661.
51. Altae-Tran H, Ramsundar B, Pappu AS, et al. Low data drug dis- •• The first publication of generative adversarial nets
covery with one-shot learning. ACS Cent Sci. 2017;3(4):283–293. 76. Kadurin A, Aliper A, Kazennov A, et al. The cornucopia of meaningful
52. Baskin II. Is one-shot learning a viable option in drug discovery? leads: applying deep adversarial autoencoders for new molecule
Expert Opin Drug Discov. 2019;14(7):601–603. development in oncology. Oncotarget. 2017;8(7):10883–10890. .
10 I. I. BASKIN
77. Blaschke T, Olivecrona M, Engkvist O, et al. Application of generative 97. Polykovskiy D, Zhebrak A, Sanchez-Lengeling B, et al. Molecular
autoencoder in de novo molecular design. Mol Inf. 2017;36:1700123. sets (MOSES): a benchmarking platform for molecular generation
78. Rasmussen CE, Williams CKI. Gaussian processes in machine learning. models. ArXiv Preprint. 2018;arXiv:181112823.
Cambridge, Massachusetts: The MIT Press; 2006. (Dietterich T, editor). 98. Preuer K, Renz P, Unterthiner T, et al. Fréchet chemnet distance:
79. Prykhodko O, Johansson SV, Kotsias P-C, et al. A de novo molecular a metric for generative models for molecules in drug discovery.
generation method using latent vector based generative adversar- J Chem Inf Mod. 2018;58(9):1736–1741.
ial network. J Cheminf. 2019;11(1):74. 99. Baskin II, Ait AO, Halberstam NM, et al. An approach to the inter-
80. Guimaraes GL, Sanchez-Lengeling B, Farias PLC, et al. Objective- pretation of backpropagation neural network models in QSAR
reinforced generative adversarial networks (organ) for sequence studies. SAR QSAR Environ Res. 2002;13(1):35–41.
generation models. ArXiv Preprint. 2017;arXiv:170510843. 100. Imrie F, Bradley AR, van der Schaar M, et al. Deep generative
81. Jin W, Barzilay R, Jaakkola T. Junction Tree Variational Autoencoder for models for 3D compound design. BioRxiv Preprint. 2019;
Molecular Graph Generation. ArXiv Preprint. 2018;arXiv:180204364. bioRxiv:830497.
82. Simonovsky M, Komodakis N. GraphVAE: towards Generation of 101. Grow C, Gao K, Nguyen DD, et al. Generative network complex
Small Graphs Using Variational Autoencoders. ArXiv Preprint. (GNC) for drug discovery. arXiv Preprint. 2019;arXiv:191014650.
2018;arXiv:180203480. 102. Gebauer N, Gastegger M, Schütt KT. Generating equilibrium mole-
83. Samanta B, De A, Jana G, et al. NeVAE: a deep generative model for cules with deep neural networks. ArXiv Preprint. 2018;
molecular graphs. Proceedings of the AAAI Conference on Artificial arXiv:181011347.
Intelligence. 2019:33(1):1110–1117. 103. Gebauer N, Gastegger M, Schütt KT. Symmetry-adapted generation
84. De Cao N, Kipf T. MolGAN: an implicit generative model for small of 3d point sets for the targeted discovery of molecules. ArXiv
molecular graphs. ArXiv Preprint. 2018;arXiv:180511973. Preprint. 2019;arXiv:190600957.
85. Popova M, Shvets M, Oliva J, et al. MolecularRNN: generating 104. Zankov DV, Madzhidov TI, Rakhimbekova A, et al. Conjugated
realistic molecular graphs with optimized properties. ArXiv quantitative structure–property relationship models: application
Preprint. 2019;arXiv:190513372. to simultaneous prediction of tautomeric equilibrium constants
86. Kobyzev I, Prince S, Brubaker MA. Normalizing flows: introduction and acidity of molecules. J Chem Inf Mod. 2019;59
and ideas. arXiv Preprint. 2019;arXiv:190809257. (11):4569–4576.
87. Papamakarios G, Nalisnick E, Rezende DJ, et al. Normalizing Flows 105. Baskin II, Halberstam NM, Mukhina TV, et al. The learned symmetry
for Probabilistic Modeling and Inference. arXiv Preprint. 2019; concept in revealing quantitative structure-activity relationships with
arXiv:191202762. artificial neural networks. SAR QSAR Environ Res. 2001;12(4):401–416.
88. Madhawa K, Ishiguro K, Nakago K, et al. An invertible flow model for 106. Ertl P, Schuffenhauer A. Estimation of synthetic accessibility score
generating molecular graphs. arXiv Preprint. 2019;arXiv:190511600. of drug-like molecules based on molecular complexity and frag-
89. Liu J, Kumar A, Ba J, et al. Graph normalizing flows. Advances in ment contributions. J Cheminf. 2009;1(1):8.
Neural Information Processing Systems 32 (NIPS 2019). Curran 107. Coley CW, Rogers L, Green WH, et al. SCScore: synthetic complexity
Associates, Inc; 2019, p. 13578–13588. learned from a reaction corpus. J Chem Inf Mod. 2018;58(2):252–261.
90. Shi C, Xu M, Zhu Z, et al. GraphAF: a flow-based autoregressive model 108. Baskin II, Madzhidov TI, Antipin IS, et al. Artificial intelligence in
for molecular graph generation. arXiv Preprint. 2020;arXiv:200109382. synthetic chemistry: achievements and prospects. Russ Chem Rev.
91. Bjerrum E, Sattarov B. Improving chemical autoencoder latent space and 2017;86(11):1127–1156. .
molecular de novo generation diversity with heteroencoders. Biomolecules. • A comprehensive review on the use of artificial intelligence in
2018;8(4):131. synthetic chemistry
92. Kusner MJ, Paige B, Hern M, et al. Grammar variational 109. Liu B, Ramsundar B, Kawthekar P, et al. Retrosynthetic reaction
autoencoder. Proceedings of the 34th International Conference prediction using neural sequence-to-sequence models. ACS Cent
on Machine Learning - Volume 70; Sydney, NSW, Australia. JMLR. Sci. 2017;3(10):1103–1113. .
org; 2017. p. 1945–1954. 110. Segler MHS, Preuss M, Waller MP. Planning chemical syntheses with
93. Noel OB, Andrew D DeepSMILES: an adaptation of smiles for use in deep neural networks and symbolic AI. Nature. 2018;555:604. .
machine-learning of chemical structures; 2018. • An important publication on the use of deep learning in plan-
94. Krenn M, Häse F, Nigam A, et al. SELFIES: a robust representation of ning organic synthesis
semantically constrained graphs with an example application in 111. Coley CW, Green WH, Jensen KF. Machine learning in computer-aided
chemistry. ArXiv Preprint. 2019;arXiv:190513741. synthesis planning. Acc Chem Res. 2018;51(5):1281–1289.
95. Nigam A, Friederich P, Krenn M, et al. Augmenting genetic algo- 112. Button A, Merk D, Hiss JA, et al. Automated de novo molecular
rithms with deep neural networks for exploring the chemical space. design by hybrid machine intelligence and rule-driven chemical
arXiv Preprint. 2019;arXiv:190911655. synthesis. Nat Mach Intelligen. 2019;1(7): 307–315. .
96. Brown N, Fiscato M, Segler MHS, et al. GuacaMol: benchmarking models • An important publication on combining de novo molecular
for de novo molecular design. J Chem Inf Mod. 2019;59(3):1096–1108. design with synthesis planning