Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Expert Opinion on Drug Discovery

ISSN: 1746-0441 (Print) 1746-045X (Online) Journal homepage: https://www.tandfonline.com/loi/iedc20

The power of deep learning to ligand-based novel


drug discovery

Igor I. Baskin

To cite this article: Igor I. Baskin (2020): The power of deep learning to ligand-based novel drug
discovery, Expert Opinion on Drug Discovery, DOI: 10.1080/17460441.2020.1745183

To link to this article: https://doi.org/10.1080/17460441.2020.1745183

Published online: 31 Mar 2020.

Submit your article to this journal

Article views: 9

View related articles

View Crossmark data

Full Terms & Conditions of access and use can be found at


https://www.tandfonline.com/action/journalInformation?journalCode=iedc20
EXPERT OPINION ON DRUG DISCOVERY
https://doi.org/10.1080/17460441.2020.1745183

PERSPECTIVE

The power of deep learning to ligand-based novel drug discovery


Igor I. Baskina,b
a
Faculty of Physics, M.V. Lomonosov Moscow State University, Moscow, Russia; bButlerov Institute of Chemistry, Kazan Federal University, Kazan,
Russia

ABSTRACT ARTICLE HISTORY


Introduction: Deep discriminative and generative neural-network models are becoming an integral Received 24 December 2019
part of the modern approach to ligand-based novel drug discovery. The variety of different architec- Accepted 17 March 2020
tures of neural networks, the methods of their training, and the procedures of generating new KEYWORDS
molecules require expert knowledge to choose the most suitable approach. Neural networks; deep
Areas covered: Three different approaches to deep learning use in ligand-based drug discovery are learning; drug discovery;
considered: virtual screening, neural generative models, and mutation-based structure generation. generative models; artificial
Several architectures of neural networks for building either discriminative or generative models are intelligence
considered in this paper, including deep multilayer neural networks, different kinds of convolutional
neural networks, recurrent neural networks, and several types of autoencoders. Several kinds of learning
frameworks are also considered, including adversarial learning and reinforcement learning. Different
types of representations for generating molecules, including SMILES, graphs, and several alternative
string representations are also considered.
Expert opinion: Two kinds of problem should be solved in order to make the models built using deep
neural networks, especially generative models, a valuable option in ligand-based drug discovery: the
issue of interpretability and explainability of deep-learning models and the issue of synthetic accessi-
bility of novel compounds designed by deep-learning algorithms.

1. Introduction This review focuses on the opportunities and benefits that


deep learning provides for ligand-based novel drug discovery,
Artificial neural networks have long been at the center of
both for virtual screening and for the directed generation of new
attention in chemoinformatics and computational medicinal
chemical structures with desired properties using either neural-
chemistry [1–4]. In 2012, in a review article on the prospects
network generative models or mutation-based procedures.
of using machine learning methods in chemoinformatics, it was
pointed to a newly emerged area, called deep learning [5], as
a promising methodology that in the future would allow the de 2. Ways to use deep learning in ligand-based novel
novo design of novel drugs using generative neural network drug discovery
models [6]. Indeed, after a few years, a sharp rise in interest in
the use of neural networks in drug discovery began, and at Deep learning can be used in ligand-based novel drug discovery (i)
present, we are witnessing an unprecedented pace of develop- during virtual screening of existing libraries of chemical structures
ment of this scientific direction, as can be seen from the using discriminative models, i.e. the models that discriminate
numerous reviews published over the past few years [4,7–23]. between active and inactive compounds or predict activity level,
Such a rapid development in recent years has become possible (ii) in the directed generation of new chemical structures with
not only due to the accumulation of a large amount of data in desired properties using generative models, i.e. the probabilistic
chemistry and pharmacology and a significant increase in com- models from which new molecules can be sampled, and (iii) in
puting power thanks to using GPUs. Undoubtedly, the decisive mutation-based molecule generation procedures.
factor was the tremendous success of using deep learning in
the processing of text, images, voice, playing games, the crea-
2.1. Deep multilayer neural networks (DNNs)
tion of autonomous robots and cars, which allowed achieving
human and even superhuman performance and thereby revo- Virtual screening allows a computer to filter predefined
lutionizing the field of artificial intelligence (AI). The most effec- libraries of chemical compounds, discarding compounds with
tive and universal approaches developed for this purpose were unfavorable toxicological properties, or keeping compounds
transferred after appropriate modifications to the field of drug with the necessary spectrum of activities and favorable phar-
discovery. This led to the formation of the concept of AI in drug macokinetic properties [25–27]. In this case, deep learning is
discovery, which often includes the use of other machine learn- usually used to create filters based on classification or regres-
ing methods [16,18,19,21,24]. sion models that allow predicting biological activity based on

CONTACT Igor I. Baskin igbaskin@gmail.com Faculty of Physics, M.V. Lomonosov Moscow State University, Moscow, Russia
© 2020 Informa UK Limited, trading as Taylor & Francis Group
2 I. I. BASKIN

a simpler character of the dependence of the target properties


Article Highlights on them and thereby to a higher predictive ability. This is
● The power of deep learning to ligand-based drug discovery stems
fundamentally different from standard machine learning
from the ability to create neural-network filters for virtual screening, methods, where the possibility of transforming the descriptor
to build and run generative neural-network models, and to control space of molecules is confined only to descriptor selection and
the mutation-based generation of chemical structures with desired
target properties.
the formation of their linear combinations (as in the partial
● Deep multilayer neural networks combine and transform initial least squares method).
descriptors to make them more suitable for predicting target The performance of deep learning in building structure-
properties.
● Deep convolutional neural networks can derive descriptors directly
activity models useful for drug discovery was compared with
from raw descriptions of chemical structures and use them to predict that of other state-of-the-art machine learning methods in sev-
the target properties of chemical compounds. eral benchmark studies, starting from the pioneer publication of
● Deep recurrent neural networks can be used to analyze and generate
chemical structures represented as SMILES strings.
Dahl et al. reporting the participance in Kaggle competition on
● Deep autoencoders produce revertible descriptors from which che- Merck molecular activity challenge [29]. Mayr et al. made such
mical structures can be reconstructed. In conjunction with recurrent a comparison for toxicity prediction using Tox21 Data Challenge
neural networks, this allows for designing chemical compounds with
desired properties.
and demonstrated that it outperformed other machine learning
● Generative models for performing novel drug discovery can be built methods, including naïve Bayes, support vector machines and
using a variety of deep architectures of neural networks in combina- random forest [30]. Moreover, they have demonstrated in the
tion with adversarial and reinforcement learning.
same publication that the above-discussed hierarchy of chemical
This box summarizes key points contained in the article. features (descriptors, internal representations) is indeed con-
structed in the hidden layers of DNNs. Lenselink et al. [31] and
Mayr et al. [32] made a large-scale comparison of deep learning
with other machine learning methods for drug target prediction
the values of molecular descriptors [9]. For this, multilayer on ChEMBL, and the supremacy of deep learning was demon-
neural networks are used, consisting of an input layer, each strated. It was even claimed that ‘the predictive performance of
neuron of which corresponds to a molecular descriptor, an deep learning is in many cases comparable to that of tests
output layer, each neuron of which corresponds to one of the performed in wet labs (i.e., in vitro assays)’ [32].
simultaneously predicted properties, and one or several layers
of hidden neurons that serve to learn internal representations
2.2. Multitask learning
of molecules [28], see Figure 1. Such neural networks with
a single hidden layer are called ‘shallow’, while in the presence If there are several neurons in the output layer, each of them is
of multiple hidden layers such networks are called ‘deep’. All usually responsible for predicting a separate property/activity.
hidden layers, along with the output layer, in these networks Since building a model for predicting each of them can be
are ‘dense’, i.e. each neuron from the next layer receives considered as a separate task, the training of such a neural
signals from all neurons of the previous layer. The outputs of network is called multitask learning [33]. If several properties
the neurons of the first hidden layer closest to the input layer are related to each other, then it can be expected that
can be considered as a new set of descriptors formed by a common model that provides their simultaneous prediction
combining the original molecular descriptors on the hidden should have a higher predictive performance compared to
neurons to increase the predictive performance of the net- individual models due to the ‘inductive transfer of knowledge’
work. The outputs of the neurons of each next hidden layer in between the tasks. This is achieved through the formation of
a DNN would be formed by combining the descriptors formed common descriptors in hidden neurons. In the field of che-
on the previous hidden layer to make them more suitable for moinformatics and drug discovery the advantage of perform-
property prediction. So, when moving from the input to the ing multitask learning was first demonstrated by Varnek et al.
output layer in a DNN, increasingly higher-level descriptors are on the example of predicting several tissue/air partitioning
formed on the hidden layers, which are more suitable for coefficients using shallow neural networks [34]. This made it
predicting the target property. Thus, a unique feature of possible to significantly increase the predictive ability for
deep learning in DNNs is the gradual nonlinear transformation models built for human tissues with a small amount of experi-
of the descriptor description of the molecules, which leads to mental data due to the simultaneous construction of related

Figure 1. A deep multilayer neural network applied to the prediction of three target properties of chemical compounds.
EXPERT OPINION ON DRUG DISCOVERY 3

models for rodent tissues with a large amount of experimental biology, and the weights of the connections going to them
data. (called filters, by analogy with image processing) are common
The advantage of combining multitask learning with deep to (shared among) all neurons of the convolutional layer. The
learning is the possibility of forming a deep hierarchy of purpose of the convolutional layer is to transform local
general internal representations (descriptors), which provides descriptors to take into account the influence of the nearest
more opportunities for ‘induction transfer’ between tasks, neighborhood (for example, the influence of the nearest
thereby increasing the predictive performance. In one of the atoms in a molecule) to increase the predictive performance
first publications on the use of deep learning in the field of of the neural network. The pooling layers consist of neurons
drug discovery, Ma et al. have applied multitask learning to with fixed weights that generalize the local descriptors formed
build QSAR model and found that the increase in predictive in the convolutional layers and turn them into a set of a fixed
performance is due to ‘borrowing’ information on related number of global descriptors. Further, the global molecular
properties from similar compounds in other datasets [35]. descriptors formed in the last pooling layer can enter the
This allowed the authors to develop a strategy for construct- input of DNN with dense hidden layers. Thus, due to the
ing multi-tasking deep neural networks in which cooperation automatic formation of local and then global molecular
between different tasks would lead to an improvement in descriptors in hidden layers, CNNs can build direct correlations
predictive ability. Recent advances in the use of multitasking between ‘raw’ descriptions of chemical structures and their
training of deep neural networks in chemoinformatics and target properties. For the first time, the ability to perform
drug discovery have been surveyed by Sosnin et al. [36]. direct structure-property correlations using similar principles
was demonstrated in the 90 s using ‘a neural device for
searching direct correlations between structures and proper-
2.3. Convolutional neural networks (CNNs)
ties of chemical compounds’ [37].
The deep multilayer neural networks discussed above use There exist several types of CNNs depending on the type of
their hidden layers to transform a fixed number of originally ‘raw’ representation of molecules. The most popular among
defined global molecular descriptors into their non-linear them are 1D-, 2D-, 3D-, and graph-CNNs. 1D-CNNs use SMILES
combinations, more suitable for predicting the target proper- [38] as a ‘raw’ representation and ‘one-hot’ encoding of
ties of chemical compounds. In this case, global molecular a character in each position of the string as initial local
descriptors are understood as numbers characterizing the descriptors. A ‘one-hot’ encoded character is a vector contain-
entire chemical structure as a whole and independent of the ing 1 in the position corresponding to its type and 0 else-
numbering of atoms in the molecule, as well as the movement where. As an example, this kind of network was used by Kwon
and rotation of the molecule in space. The success of building and Yoon in the DeepCCL method to predict chemical-
models with such descriptors is largely determined by how chemical interaction directly from SMILES strings [39]. 2D-
brilliant the scientists who developed such descriptors were CNNs use graphic images of structural formulas as a ‘raw’
and how successfully a set of descriptors was selected to build representation of chemical structures and color codes of indi-
a specific structure-property model. CNNs go much further vidual pixels as initial local descriptors. So, one can produce
and eliminate the subjective human factor by forming global images for the structural formula of all chemical compounds in
molecular descriptors directly from a ‘raw’ description of che- the dataset and use powerful image recognition software to
mical structures, such as molecular graphs, see below. In build predictive models for target properties. This methodol-
numerical form, such a ‘raw’ description can be represented ogy has been successfully applied in several publications
by either a variable or a fixed number of local descriptors (e.g. [40,41] and led to the discovery of CDK4 inhibitors in ligand-
indicators of atom types or local charges on atoms), the values based virtual screening [42]. 3D-CNNs use molecular fields
of which depend on the permutations of atoms in the mole- (both ‘real’ physical fields, such as electrostatic potential, and
cules or on the movement and rotation of the molecule in pseudo-fields indicating the presence of certain atom types at
space. CNNs transform such local descriptors into global ones given location in fuzzy manner) as a ‘raw’ representation of
using special convolutional and pooling layers of hidden neu- chemical structures and the values of such molecular fields
rons (Figure 2). In contrast to the dense hidden layers consid- computed for individual voxels (3D analogs of pixels) as initial
ered above, neurons of a convolutional layer receive signals local descriptors. Although most of the work on the use of this
only from a small portion of the previous layer, called the type of neural network involves the consideration of protein-
receptor field, by analogy with the mechanism of vision in ligand complexes, this network architecture can successfully

Figure 2. A graph convolutional neural network applied to the prediction of three target properties of chemical compounds.
4 I. I. BASKIN

be applied to build 3D-QSAR models [43,44]. Graph-CNNs use new chemical structures. This can be performed because, in
molecular graphs as a ‘raw’ representation of chemical struc- addition to the input and hidden layers in RNNs, there is also
tures, properties of atoms as initial local descriptors, and close an output layer with the softmax activation function that
neighborhood of atoms in molecules as ‘receptive fields’ for calculates the probability of each type for the next symbol.
neurons in convolutional layers. Currently, deep neural net- So, in the generator mode, this probability distribution can be
works with such an architecture are actively used to automa- sampled, and the ‘one-hot encoded’ sampled character can be
tically generate global descriptors for predicting various used as an input to the network in order to generate one
properties of chemical compounds [45–49]. In addition, graph- more character, etc. The growth of the current SMILES string
CNNs are currently used as blocks in neural networks with can be terminated upon the generation of a special stop-
more complex architectures, including mechanisms that symbol. In this way, new chemical structures that can further
allow one to focus ‘attention’ on individual molecules or be used in virtual screening can be generated [56]. In order to
their parts [50]. In particular, they are part of a special archi- direct the neural network to generate structures with desired
tecture of neural networks, which allows building classification properties, several approaches have been developed. One of
models for predicting the type of biological activity in the them is based on so-called transfer learning in which
presence of a very small number of examples within the a network is first trained on a large data set to generate
framework of the so-called ‘one-shot learning’ [51], see discus- correct SMILES and then fine-tuned to specific ligand subsets
sion in Ref [52]. to generate active structures for specific biological targets [57].

2.4. Recurrent neural networks (RNNs) 2.5. Deep reinforcement learning

RNNs work with molecules represented as sequences, e.g. Another approach to generate structures with desired properties
sequences of characters in SMILES strings for small molecules is to use reinforcement learning (RL), which is a general framework
(Figure 3). RNNs iteratively, starting from the first element, apply in machine learning dealing with how software ‘agents’ learn to
the same recurrent procedure to all elements of the sequence, take actions in an ‘environment’ so as to maximize reward (or
while forming in its hidden layers descriptors for its first minimize penalty) [58]. RL is used in AI to solve dynamic decision
t elements, dt, by combining already computed descriptors for problems, so as to take such actions that would lead to desirable
the first t-1 elements, dt−1, with initial local descriptors for the t-th outcome. RL powered by deep neural networks implementing its
element, it. This is provided by special recurrent neurons, in which functions is called deep RL. In application to the generation of
the outputs change with discrete time t, and a special loop with chemical structures, RL ‘rewards’ for generating structures with
time delay connects the neuron at time t-1 with itself at time t. In desired target properties and ‘punishes’ for undesired properties,
order to solve some problems associated with the functioning of like in the method of performing de novo design developed by
RNNs, instead of simple recurrent neurons, it is customary to use Popova et al. [59]. In the case of generating SMILES by an RNN,
special microarchitectures, particularly LSTM (Long Short-Term the network is rewarded during the training whenever
Memory) [53] and GRU (Gated Recurrent Units) [54]. a completely generated SMILES corresponds to a correct chemi-
In the case of SMILES, the initial local descriptors for the t-th cal structure possessing desirable properties. So, due to the use
element is a ‘one-hot’ encoded type of character at the t-th of deep RL, RNNs learn how to produce each next character in
position of the string. After passing through the whole SMILES SMILES in order to be able to generate chemical structures with
string a vector of global molecular descriptors is formed for the desired properties. Deep RL is currently very actively used in drug
corresponding chemical structure. In this case, an RNN can be discovery in conjunction with various approaches to generate
trained to predict each t-th character from the descriptors dt−1 chemical structures [58–64]. Some of them will be considered
computed for the previous substring, using a large set of correct below.
SMILES. Descriptor vectors computed for the whole SMILES
strings can be used for building models to predict chemical
2.6. Autoencoders
properties, like in the SMILES2vec approach [55].
The ability of a trained RNN to predict the next character in An autoencoder is a neural network containing two subnetworks,
a partially formed SMILES can efficiently be used to generate an encoder and a decoder [28], see Figure 4. The encoder transforms

Figure 3. A recurrent neural network applied to the prediction (in training mode) or the generation (in generator mode) of the SMILES string for ethanol.
EXPERT OPINION ON DRUG DISCOVERY 5

Figure 4. An autoencoder transforming a chemical structure to a code vector in the autoencoder latent space followed by the reconstruction of the chemical
structure from it.

an initial representation of a molecule to a set of descriptors, called comparison with ordinary autoencoders. This allows drawing
code vector, from which the decoder reconstructs the initial repre- continuous trajectories in the latent space, from each point of
sentation. The code vectors form the latent space of the autoenco- which molecules can be reconstructed. This allowed Gomez-
der. The training of the autoencoder is aimed at reducing the Bombarelli et al. to propose a method for optimizing chemical
reconstruction error with the smallest possible number of code structures in order to improve the desired properties [71]. For
descriptors. If such autoencoder is trained using a set of com- this purpose, an additional model can be built to predict the
pounds with a certain type of biological activity, then the recon- target properties from the values of latent variables (code
struction error can be used as a scoring function in virtual vectors), so optimization of the property in the latent space
screening to discover novel drugs belonging to the same activity results in the optimization of discrete chemical structures. An
class [65,66]. The use of deep learning enables autoencoders to important modification of VAEs are conditional VAEs (CVAEs)
work with ‘raw’ representations of chemical structures. In particu- where vectors of target properties are injected into the auto-
lar, chemical structures represented using SMILES strings can be encoder, so it is possible to generate chemical structures with
processed using a sequence-to-sequence architecture in which a set of desired target properties without the need to perform
RNNs are used both for encoding the SMILES strings to code optimization in the latent space [72]. Another way to perform
vectors and decoding them back to the SMILES strings [67]. In the similar conditional molecular design of compounds with
this case, the goal of the training is to correctly restore SMILES desired target properties is to use a semi-supervised modifica-
strings from their code vectors. Xu et al. have shown that such code tion of VAEs (SSVAEs) trained on a set of partially annotated
vectors, called ‘seq2seq fingerprints’, could be useful for drug chemical compounds, so new molecules with desired proper-
discovery, because chemical structures can be reconstructed ties can be generated by sampling from the conditional dis-
from them using the decoder [68]. tribution without any extra optimization [73].
Several approaches to the generation of chemical structures Zhavoronkov et al. used VAE with a rich prior distribution
with desired properties using autoencoder code vectors have in the latent space in combination with tensor decomposi-
been proposed. All of them are based on (1) approximation of tion and RL based on reward functions approximated by
the statistical distribution of molecules in the latent space self-organizing maps (SOMs) to discover novel potent inhi-
defined by code vectors, (2) sampling of new points from this bitors of discoidin domain receptor 1 in 21 days [74], see
distribution, (3) restoration of chemical structures from them Figure 5.
using the decoder. Sattarov et al. approximated molecule dis-
tribution using the GTM (Generative Topographic Mapping)
2.7. Adversarial learning
machine learning method, used the ‘activity landscape’ metho-
dology to select zones enriched with molecules with the desired The disadvantage of VAEs is their tight attachment to the
activity and generated new molecules by feeding the points Gaussian distribution used as a prior, while the distribution
sampled in the latent space from such zones to the decoder [69]. of chemical compounds in databases is often far from
Several published approaches to generation of chemical Gaussian and has many modes. Such a mismatch can lead to
structures with desired properties are based on the use of the appearance of large ‘holes’ in some zones in the latent
variational autoencoders (VAE) – a special modification of auto- space, which may make it poorly suitable for optimizing che-
encoders in which a molecule is mapped by the encoder not mical structures. An effective approach to solving this problem
to a single code vectors but to a Gaussian distribution over the is to use adversarial learning, which underlies generative adver-
latent space containing all possible code vectors [70]. During sarial nets (GANs) [75]. GANs consist of two networks,
the training, the decoder learns to reconstruct the original a generator, which learns to generate new objects, and dis-
molecule from a random code vector sampled from this dis- criminator, which learns to discriminates between real and
tribution. Since any point close to the original distribution of generated objects. The adversarial learning is like a game in
molecules in the latent space can be sampled, VAE creates its which the generator learns to ‘fool’ the discriminators which
latent space with much fewer ‘holes’ (zones in the latent space also learns not to be ‘fooled’, and in the result, the generator
from which correct molecules cannot be reconstructed) in becomes so ‘smart’ that the generated objects become
6 I. I. BASKIN

Figure 5. Novel potent inhibitors of human DDR1 kinase found by DL-driven de novo drug design.

indistinguishable from the real ones. Adversarial autoencoders 2.8. Graph-based generation of chemical structures
(AAEs) combine autoencoders with adversarial learning. In this
The above-discussed SMILES-based generation of chemical
case, an additional discriminator neural network learns to
structures has, however, serious limitations. First, the genera-
discriminate between the code vectors produced by the enco-
tor should know or learn from data the grammar of SMILES in
der and random vectors sampled from the user-defined prior
order to keep all parentheses balanced and all ring closures
distribution, while the encoder learns to ‘fool’ the discrimina-
paired. Second, multiple SMILES can be assigned to the same
tor and generate code vectors in accordance with the speci-
chemical structure, while rather dissimilar canonical SMILES
fied prior distribution. For the first time, AAEs were applied by
can correspond to very similar molecules, leading to bad
Kadurin et al. to generate molecular fingerprints, which can
neighborhood behavior [81]. To address these problems, it
further be used for discovering new drugs through virtual
has been suggested to use deep learning algorithms to gen-
screening [76]. Blaschke et al. have applied AAEs for producing
erate chemical structures as molecular graphs directly, without
the latent space for performing optimization of chemical
the use of SMILES. In this case, VAE, GAN and RL frameworks
structures to perform de novo molecular design using the
can be combined with graph-CNNs. Only a few examples from
Bayesian optimization algorithm [77]. They have demonstrated
this fast-growing research domain are given here. Simonovsky
the advantage of using AAEs with the uniform prior distribu-
et al. [82], Jin et al. [81], and Samanta et al. [83] have devel-
tion over the standard VAEs. The above-mentioned Bayesian
oped special graph-based VAEs, whereas De Cao and Kipf [84]
optimization algorithm applied to optimization of molecules
have developed a graph-based GAN. Popova et al. have devel-
works by maximizing the probability of being active against
oped autoregressive MolecularRNN, a graph recurrent genera-
a specific target predicted by Gaussian processes [78] models.
tive model for generating chemical structures with desirable
Such models are first trained with a set of initial points in the
properties, which is trained using RL [85].
autoencoder latent space, then in each iteration a new point
A common drawback of graph-based methods for generating
selected by maximizing a certain probabilistic criterion based
chemical structures is significantly lower computational effi-
on the current model is added to the set, the model is rebuilt
ciency and significantly greater memory requirements compared
and so on. Prykhodko et al. have developed the LatentGAN
to the SMILES-based methods. This, in turn, imposes certain
architecture which also combines an autoencoder with a GAN
restrictions on the size of molecular graphs. An effective solution
for de novo molecular design [79]. Guimaraes et al. have
to this and several other problems has recently been proposed
combined SMILES-based structure generation with both the
using a very promising approach – graph normalizing flows.
adversarial and RL into the ORGAN (Objective-Reinforced
A normalizing flow is a transformation of a simple probability
GANs) network, in which the generator is trained to maximize
distribution (like a Gaussian distribution in a latent space) into
two rewards, the first one that improves the activity of mole-
a probability distribution of any complexity (like distribution of
cules, and the second one that learns to mimic real molecules
molecules in chemical space) by a sequence of invertible and
by ‘fooling’ the discriminator [80].
differentiable mappings [86,87]. Unlike VAEs and GANs, they can
EXPERT OPINION ON DRUG DISCOVERY 7

approximate data distributions exactly and provide from them generative models in terms of diversity, different kinds of similarity,
sampling very efficiently due to the invertibility of transforma- Fréchet ChemNet Distance [98], synthetic accessibility, drug-
tions. Since normalizing flows can operate both in continuous likeness, etc.
and discrete domains, they have been adopted to generate
graphs. This has recently led to the development of three novel
3. Conclusions
methods for generating molecular graphs: GraphNVP [88],
GRevNets [89] and GraphAF [90]. Deep learning provides very powerful tools that can be used
in ligand-based novel drug discovery both to conduct virtual
screening of the already prepared libraries of chemical com-
2.9. Generation of chemical structures using alternative
pounds and to generate new compounds with desired proper-
string representations
ties. The deep architecture of neural networks with multiple
Despite the above issues, using SMILES to generate chemical hidden layers makes an internal representation of molecules
structures, however, has some clear advantages, including more suitable for predicting target properties, as well as for
simplicity, versatility, and the ability to use a rich arsenal of transferring knowledge between different tasks. CNNs can
methodologies and software tools designed for text mining. work directly with chemical structures without the use of
Several approaches were proposed to overcome the draw- descriptor sets previously invented by humans. Using RNNs,
backs of using SMILES, while still preserving the benefits of one can apply text processing methods to chemical structures
using strings. First, it has been shown that the problem of represented using SMILES strings. Chemical structures can be
poor neighborhood behavior for canonical SMILES can be generated either through SMILES strings or directly working
overcome by using sets of different non-canonical SMILES with molecular graphs. Autoencoders can be used to produce
strings [91]. Second, several alternative string representations ‘invertible’ descriptors from which chemical structures can be
addressing the grammar-related problem have been sug- recovered. This allows one to effectively design new com-
gested. Kusner et al. have developed the GrammarVAE pounds, both by generating new chemical structures for
approach based on representing SMILES as a grammar parse a given property and by optimizing chemical structures in
tree, so all SMILES stings generated using VAEs (see above) are the latent space of autoencoders. Different types of VAEs,
guaranteed to have valid syntax [92]. To achieve the same GANs, and RL are powerful tools that can be very useful for
goal, O’Boyle and Dalke have developed DeepSMILES, ligand-based drug discovery. Several success stories that have
a modification of SMILES which avoids the problems of unba- already been reported in the literature.
lanced parentheses and pairing ring closure symbols [93].
Krenn et al. have developed the SELFIES string representation
4. Expert opinion
based on a Chomsky type-2 grammar that can be used to
represent graphs so that random modifications in them would Despite very promising results obtained by using deep learn-
lead to correct semantically-constrained graphs [94]. ing in ligand-based drug discovery, it is necessary to note
some still challenging problems. The first one is the issue of
interpretability, explainability, and integration with deep
2.10. Mutation-based generation of chemical structures
scientific knowledge. The discovery of drugs is a very long
A reasonable alternative to the neural-network generation of and expensive process, in which many specialists from various
new chemical structures in de novo drug design is a more fields are involved, who should well understand the decisions
traditional approach, which consists in modifying molecules made by colleagues. The process of drug design powered by
using chemical ‘mutations’. In the past year, there has been AI-based on deep learning algorithms should not be per-
a trend in combining nutation-based structure generation ceived as a sort of magic that can only be blindly believed
with deep learning. For example, Zhou et al. have combined but not completely trusted. Unfortunately, this is not facili-
optimization of molecules using a small set of chemical ‘muta- tated by the ‘black box’ nature of deep neural networks,
tions’ with deep RL in order to generate new molecules with whose work is still not fully understood by mathematicians
desired properties [64]. Nigam et al. have combined the and even their creators. In the field of drug discovery, the
genetic algorithm (GA) based on mutations defined on human-AI interaction is very important, and the abbreviation
SELFIES strings [94] with a DNN-based discriminator model AI should mean augmented intelligence rather than artificial
and have shown that this approach ‘outperforms other gen- intelligence. So, AI should be rather a partner of medicinal
erative models in optimization tasks’ [95]. chemists than their replacement. The interpretation of the
neural network model allows a better understanding of the
nature of the phenomenon described by it [99]. Interactive
2.11. Benchmarking generation of chemical structures
visualization tools, like those based on GTM landscapes, are
In order to compare different generative models for de novo useful for involving humans in the generation of chemical
molecular design, two benchmarking platforms have been devel- structures by neural networks [69]. A positive trend is the
oped, GuacaMol by Brown et al. [96] and MOSES (Molecular SEtS) explicit consideration of their spatial structure in 3D space in
by Polykovskiy et al. [97]. Both contain standard molecular bench- the process of generating molecules because this is an impor-
mark datasets, implementations of several algorithms for produ- tant factor determining the biological activity of drugs [100–
cing generative models and generating chemical structures, and 103]. Empowering a neural network with the ability to cor-
several metrics for comparing the performance of different rectly consider the fundamental relationships between
8 I. I. BASKIN

different target properties [104], as well as the correct sym- 3. Baskin II, Palyulin VA, Zefirov NS. Neural networks in building QSAR
metry properties [103,105], is also important for taking into models. Methods Mol Biol. 2008;458:137–158.
4. Baskin II, Winkler D, Tetko IV. A renaissance of neural networks in
account the laws of the physical world.
drug discovery. Expert Opin Drug Discov. 2016;11(8):785–795.
Another very important challenging problem is the synthetic 5. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521
accessibility of chemical compounds in de novo drug design. Ertl (7553):436–444. .
and Schuffenhauer have developed a synthetic accessibility score •• The most cited review on deep learning
based on molecular complexity and fragment contributions [106], 6. Varnek A, Baskin I. Machine learning methods for property predic-
tion in chemoinformatics: quo vadis? J Chem Inf Mod. 2012;52(6):
and it is a common practice to use it as one of the objectives in the
1413–1437. .
process of chemical structure generation and optimization. Coley • A comprehensive review on the use of machine learning for
et al. have developed for the same purpose a synthetic complexity property prediction in chemoinformatics
score, SCScore, learned from a huge chemical reaction database 7. Gawehn E, Hiss JA, Schneider G. deep learning in drug discovery.
[107]. Although the use of such scores leads to the effective rejec- Mol Inf. 2016;35(1):3–14.
8. Ekins S. The next era: deep learning in pharmaceutical research.
tion of difficult to synthesize compounds, this problem cannot be
Pharmaceut Res. 2016;33(11):2594–2603.
solved without a detailed retrosynthetic analysis with an exact 9. Carpenter KA, Cohen DS, Jarrell JT, et al. Deep learning and virtual
indication of the structures of the reactants and reagents, along drug screening. Future Med Chem. 2018;10(21):2557–2567.
with the conditions (the temperature, solvent, catalysts, etc.) and 10. Chen H, Engkvist O, Wang Y, et al. The rise of deep learning in drug
experimental procedure for each of the stages of the synthesis. The discovery. Drug Discov Today. 2018;23(6):1241–1250.
11. Jørgensen PB, Schmidt MN, Winther O. Deep generative models for
task of planning and conducting organic synthesis is usually the
molecular science. Mol Inf. 2018;37(1–2):1700133.
responsibility of a synthetic chemist, which is not acceptable when 12. Bajorath J. Data analytics and deep learning in medicinal chemistry.
working with a large number of compounds, as well as when Future Med Chem. 2018;10(13):1541–1543.
conducting synthesis using autonomous robotic synthetic devices. 13. Tang W, Chen J, Wang Z, et al. Deep learning for predicting toxicity
It is important to note that in recent years, AI methods based on of chemicals: a mini review. J Environ Sci Health Part C. 2018;36
(4):252–271.
deep learning have also taken a decisive role in computer planning
14. Sanchez-Lengeling B, Aspuru-Guzik A. Inverse molecular design
of organic synthesis [108–111], so there is a fundamental possibility using machine learning: generative models for matter
of embedding retrosynthetic analysis in the generation of chemical engineering. Science. 2018;361(6400):360–365.
structures with desired properties. The first example of the imple- 15. Ghasemi F, Mehridehnavi A, Pérez-Garrido A, et al. Neural network
mentation of such an approach that allows for the generation of and deep-learning algorithms used in QSAR studies: merits and
drawbacks. Drug Discov Today. 2018;23(10):1784–1790.
synthetically accessible molecules along with a complete specifi-
16. Jing Y, Bian Y, Hu Z, et al. Deep learning for drug design: an
cation of their synthesis is the DINGOS system recently developed artificial intelligence paradigm for drug discovery in the big data
by Button et al. [112], which combines the ligand-similarity-based era. Aaps J. 2018;20(3):58.
generation of chemical structures with a machine learning model 17. Rifaioglu AS, Atas H, Martin MJ, et al. Recent applications of deep
trained on successful synthetic roots. learning and machine intelligence on in silico drug discovery:
methods, tools and databases. Brief Bioinform. 2018;20(5):1878–
1912.
Funding 18. Sellwood MA, Ahmed M, Segler MH, et al. Artificial intelligence in
drug discovery. Future Med Chem. 2018;10(17):2025–2028.
I Baskin is funded by the Ministry of Education, Youth and Sports of the 19. Zhavoronkov A. Artificial intelligence for drug discovery, biomarker
Czech Republic (agreement MSMT-5727/2018-2) and the Ministry of development, and generation of novel chemistry. Mol Pharm.
Higher Education and Science of the Russian Federation (project number: 2018;15(10):4311–4313.
RFMEFI58718X0049). 20. Elton DC, Boukouvalas Z, Fuge MD, et al. Deep learning for mole-
cular design—a review of the state of the art. Mol Syst Design Eng.
2019;4(4):828–849.
Declaration of Interest 21. Schneider P, Walters WP, Plowright AT, et al. Rethinking drug
design in the artificial intelligence era. Nat Rev Drug Discovery.
The authors have no other relevant affiliations or financial involvement 2019. DOI:10.1038/s41573-019-0050-3.
with any organization or entity with a financial interest in or financial 22. Xue D, Gong Y, Yang Z, et al. Advances and challenges in deep
conflict with the subject matter or materials discussed in the manuscript generative models for de novo molecule generation. Wiley
apart from those disclosed. Interdiscip Rev Comput Mol Sci. 2019;9(3):e1395.
23. Xu Y, Lin K, Wang S, et al. Deep learning for molecular generation.
Future Med Chem. 2019;11(6):567–597. .
Reviewer Disclosures 24. Lake F. Artificial intelligence in drug discovery: what is new, and
what is next? Fut Drug Discov. 2019;1(2):FDD19.
Peer reviewers on this manuscript have no relevant financial or other
25. Walters WP, Stahl MT, Murcko MA. Virtual screening - an overview.
relationships to disclose.
Drug Discov Today. 1998;3(4):160–178.
26. Green DV. Virtual screening of virtual libraries. Prog Med Chem.
2003;41:61–97.
References 27. Ripphausen P, Nisius B, Bajorath J. State-of-the-art in ligand-based
virtual screening. Drug Discov Today. 2011;16(9–10):372–376.
Papers of special note have been highlighted as either of interest (•) or of
28. Bengio Y, Courville A, Vincent P. Representation learning: A review and new
considerable interest (••) to readers.
perspectives. IEEE Trans Pattern Analys Mach Inte. 2013;35(8):1798–1828.
1. Gasteiger J, Zupan J. Neural networks in chemistry. Angew Chem
29. Dahl GE, Jaitly N, Salakhutdinov R. Multi-task neural networks for
Int Ed Engl. 1993;105(4):503–527.
QSAR predictions. arXiv preprint arXiv:14061231. 2014.
2. Halberstam NM, Baskin II, Palyulin VA, et al. Neural networks as
a method for elucidating structure-property relationships for 30. Mayr A, Klambauer G, Unterthiner T, et al. DeepTox: toxicity prediction using
organic compounds. Russ Chem Rev. 2003;72(7):629–649. deep learning. Front Environ Sci. 2016;3(80). DOI:10.3389/fenvs.2015.00080.
EXPERT OPINION ON DRUG DISCOVERY 9

31. Lenselink EB, Ten Dijke N, Bongers B, et al. Beyond the hype: deep 53. Hochreiter S, Schmidhuber J. Long short-term memory. Neural
neural networks outperform established methods using a ChEMBL Comput. 1997;9(8):1735–1780. .
bioactivity benchmark set. J Cheminf. 2017;9(1):45. • The most popular architecture of recurrent neural networks
32. Mayr A, Klambauer G, Unterthiner T, et al. Large-scale comparison 54. Chung J, Gülçehre Ç, Cho K, et al. Empirical evaluation of gated
of machine learning methods for drug target prediction on recurrent neural networks on sequence modeling. ArXiv Preprint,
ChEMBL. Chem Sci. 2018;9(24):5441–5451. 2014;arXiv:14123555..
33. Caruana R. Multitask Learning. Mach Learn. 1997;28(1):41–75. 55. Goh GB, Hodas NO, Siegel C, et al. Smiles2vec: an interpretable
34. Varnek A, Gaudin C, Marcou G, et al. Inductive transfer of knowl- general-purpose deep neural network for predicting chemical
edge: application of multi-task learning and feature net approaches properties. ArXiv Preprint, 2017;arXiv:1712.02034.
to model tissue-air partition coefficients. J Chem Inf Mod. 2009;49 56. Ertl P, Lewis R, Martin E, et al. In silico generation of novel, drug-like
(1): 133–144. . chemical matter using the LSTM neural network. ArXiv Preprint,
• The first use of multitask learning in chemoinformatics 2017;arXiv:171207449.
35. Xu Y, Ma J, Liaw A, et al. Demystifying multitask deep neural 57. Anvita GT, MA H, HBJ A, et al. Generative recurrent networks for de
networks for quantitative structure–activity relationships. J Chem novo drug design. Mol Inf. 2018;37(1–2):1700111.
Inf Mod. 2017;57(10):2490–2504. 58. Sutton RS, Barto AG. Reinforcement learning: an introduction.
36. Sosnin S, Vashurina M, Withnall M, et al. A survey of multi-task Cambridge ed. Cambridge: MIT Press; 1998.
learning methods in chemoinformatics. Mol Inf. 2019;38 59. Popova M, Isayev O, Tropsha A. Deep reinforcement learning for de
(4):1800108. novo drug design. Sci Adv. 2018;4(7):eaap7885.
37. Baskin II, Palyulin VA, Zefirov NS. A neural device for searching 60. Olivecrona M, Blaschke T, Engkvist O, et al. Molecular de-novo
direct correlations between structures and properties of chemical design through deep reinforcement learning. J Cheminf. 2017;9
compounds. J Chem Inf Comput Sci. 1997;37(4):715–721. (1):48.
• First neural network with convolutional architecture for 61. Popova M, Isayev O, Tropsha A Deep reinforcement learning for
searching direct correlations between structures and proper- de-novo drug design; 2017.
ties of chemical compounds 62. Neil D, Segler M, Guasch L, et al. Exploring deep recurrent models
38. Weininger D. SMILES, a chemical language and information system. with reinforcement learning for molecule design. 2018.
1. introduction to methodology and encoding rules. J Chem Inf 63. Putin E, Asadulaev A, Ivanenkov Y, et al. Reinforced adversarial
Comput Sci. 1988;28(1):31–36. neural computer for de novo molecular design. J Chem Inf Mod.
•• The most popular string representation of chemical structures 2018;58(6):1194–1204.
39. Kwon S, Yoon S. End-to-End Representation Learning for 64. Zhou Z, Kearnes S, Li L, et al. Optimization of molecules via deep
Chemical-Chemical Interaction Prediction. IEEE/ACM Trans reinforcement learning. Sci Rep. 2019;9(1):10752.
Comput Biol Bioinform. 2019;16(5):1436–1447. 65. Karpov PV, Osolodkin DI, Baskin II, et al. One-class classification as
40. Goh GB, Siegel C, Vishnu A, et al. Chemception: a deep neural a novel method of ligand-based virtual screening: the case of
network with minimal chemistry knowledge matches the perfor- glycogen synthase kinase 3ОІ inhibitors. Bioorg Med Chem Lett.
mance of expert-developed QSAR/QSPR models. ArXiv Preprint, 2011;21(22):6728–6731.
2017;arXiv:170606689.. 66. Zhokhova NI, Baskin II. Energy-based neural networks as a tool for
41. Fernandez M, Ban F, Woo G, et al. Toxic colors: the use of deep harmony-based virtual screening. Mol Inf. 2017;36(11):1700054.
learning for predicting toxicity of compounds merely from their 67. Sutskever I, Vinyals O, Le QV Sequence to sequence learning with
graphic images. J Chem Inf Mod. 2018;58(8):1533–1543. neural networks. Proceedings of the 27th International Conference
42. Xu Y, Chen P, Lin X, et al. Discovery of CDK4 inhibitors by convolu- on Neural Information Processing Systems - Volume 2; Montreal,
tional neural networks. Future Med Chem. 2019;11(3):165–177. Canada. 2969173: MIT Press; 2014. p. 3104–3112.
43. Sosnin S, Misin M, Palmer DS, et al. 3D matters! 3D-RISM and 3D 68. Xu Z, Wang S, Zhu F, et al. Seq2seq fingerprint: an unsupervised deep
convolutional neural network for accurate bioaccumulation molecular embedding for drug discovery. Proceedings of the 8th ACM
prediction. J Phys. 2018;30(32):32LT03. International Conference on Bioinformatics, Computational Biology,
44. Kuzminykh D, Polykovskiy D, Kadurin A, et al. 3D molecular repre- and Health Informatics; Boston, Massachusetts, USA. 3107424: ACM;
sentations based on the wave transform for convolutional neural 2017. p. 285–294.
networks. Mol Pharm. 2018;15(10):4378–4385. 69. Sattarov B, Baskin II, Horvath D, et al. De novo molecular design
45. Duvenaud DK, Maclaurin D, Iparraguirre J, et al. Convolutional by combining deep autoencoder recurrent neural networks with
networks on graphs for learning molecular fingerprints. Advances generative topographic mapping. J Chem Inf Mod. 2019;59
in Neural Information Processing Systems 28 (NIPS 2015). Montreal, (3):1182–1196.
Canada; 2015, p. 2215–2223. 70. Kingma DP, Welling M. Auto-encoding variational bayes. ArXiv
46. Kearnes S, McCloskey K, Berndl M, et al. Molecular graph convolu- Preprint. 2014;arXiv:13126114.
tions: moving beyond fingerprints. J Comput-Aided Mol Des. • The first publication on variational autoencoders.
2016;30(8):595–608. 71. Gómez-Bombarelli R, Wei JN, Duvenaud D, et al. Automatic chemi-
47. Coley CW, Barzilay R, Green WH, et al. Convolutional embedding of cal design using a data-driven continuous representation of
attributed molecular graphs for physical property prediction. molecules. ACS Cent Sci. 2018;4(2):268–276. .
J Chem Inf Mod. 2017;57(8):1757–1772. 72. Lim J, Ryu S, Kim JW, et al. Molecular generative model based on
48. Xu Y, Pei J, Lai L. Deep learning based regression and multiclass conditional variational autoencoder for de novo molecular design.
models for acute oral toxicity prediction with automatic chemical J Cheminf. 2018;10(1):31.
feature extraction. J Chem Inf Mod. 2017;57(11):2672–2685. 73. Kang S, Cho K. Conditional molecular design with deep generative
49. Ståhl N, Falkman G, Karlsson A, et al. Deep convolutional neural networks models. J Chem Inf Mod. 2019;59(1):43–52.
for the prediction of molecular properties: challenges and opportunities 74. Zhavoronkov A, Ivanenkov YA, Aliper A, et al. Deep learning
connected to the data. J Integr Bioinform. 2018;16(1):20180065. enables rapid identification of potent DDR1 kinase inhibitors. Nat
50. Michael W, Edvard L, Ola E, et al. Building attention and edge Biotechnol. 2019;37(9):1038–1040. .
convolution neural networks for bioactivity and physical-chemical 75. Goodfellow IJ, Pouget-Abadie J, Mirza M, et al. Generative adver-
property prediction; 2019. sarial nets. Arxiv Preprint. 2014;arXiv:14062661.
51. Altae-Tran H, Ramsundar B, Pappu AS, et al. Low data drug dis- •• The first publication of generative adversarial nets
covery with one-shot learning. ACS Cent Sci. 2017;3(4):283–293. 76. Kadurin A, Aliper A, Kazennov A, et al. The cornucopia of meaningful
52. Baskin II. Is one-shot learning a viable option in drug discovery? leads: applying deep adversarial autoencoders for new molecule
Expert Opin Drug Discov. 2019;14(7):601–603. development in oncology. Oncotarget. 2017;8(7):10883–10890. .
10 I. I. BASKIN

77. Blaschke T, Olivecrona M, Engkvist O, et al. Application of generative 97. Polykovskiy D, Zhebrak A, Sanchez-Lengeling B, et al. Molecular
autoencoder in de novo molecular design. Mol Inf. 2017;36:1700123. sets (MOSES): a benchmarking platform for molecular generation
78. Rasmussen CE, Williams CKI. Gaussian processes in machine learning. models. ArXiv Preprint. 2018;arXiv:181112823.
Cambridge, Massachusetts: The MIT Press; 2006. (Dietterich T, editor). 98. Preuer K, Renz P, Unterthiner T, et al. Fréchet chemnet distance:
79. Prykhodko O, Johansson SV, Kotsias P-C, et al. A de novo molecular a metric for generative models for molecules in drug discovery.
generation method using latent vector based generative adversar- J Chem Inf Mod. 2018;58(9):1736–1741.
ial network. J Cheminf. 2019;11(1):74. 99. Baskin II, Ait AO, Halberstam NM, et al. An approach to the inter-
80. Guimaraes GL, Sanchez-Lengeling B, Farias PLC, et al. Objective- pretation of backpropagation neural network models in QSAR
reinforced generative adversarial networks (organ) for sequence studies. SAR QSAR Environ Res. 2002;13(1):35–41.
generation models. ArXiv Preprint. 2017;arXiv:170510843. 100. Imrie F, Bradley AR, van der Schaar M, et al. Deep generative
81. Jin W, Barzilay R, Jaakkola T. Junction Tree Variational Autoencoder for models for 3D compound design. BioRxiv Preprint. 2019;
Molecular Graph Generation. ArXiv Preprint. 2018;arXiv:180204364. bioRxiv:830497.
82. Simonovsky M, Komodakis N. GraphVAE: towards Generation of 101. Grow C, Gao K, Nguyen DD, et al. Generative network complex
Small Graphs Using Variational Autoencoders. ArXiv Preprint. (GNC) for drug discovery. arXiv Preprint. 2019;arXiv:191014650.
2018;arXiv:180203480. 102. Gebauer N, Gastegger M, Schütt KT. Generating equilibrium mole-
83. Samanta B, De A, Jana G, et al. NeVAE: a deep generative model for cules with deep neural networks. ArXiv Preprint. 2018;
molecular graphs. Proceedings of the AAAI Conference on Artificial arXiv:181011347.
Intelligence. 2019:33(1):1110–1117. 103. Gebauer N, Gastegger M, Schütt KT. Symmetry-adapted generation
84. De Cao N, Kipf T. MolGAN: an implicit generative model for small of 3d point sets for the targeted discovery of molecules. ArXiv
molecular graphs. ArXiv Preprint. 2018;arXiv:180511973. Preprint. 2019;arXiv:190600957.
85. Popova M, Shvets M, Oliva J, et al. MolecularRNN: generating 104. Zankov DV, Madzhidov TI, Rakhimbekova A, et al. Conjugated
realistic molecular graphs with optimized properties. ArXiv quantitative structure–property relationship models: application
Preprint. 2019;arXiv:190513372. to simultaneous prediction of tautomeric equilibrium constants
86. Kobyzev I, Prince S, Brubaker MA. Normalizing flows: introduction and acidity of molecules. J Chem Inf Mod. 2019;59
and ideas. arXiv Preprint. 2019;arXiv:190809257. (11):4569–4576.
87. Papamakarios G, Nalisnick E, Rezende DJ, et al. Normalizing Flows 105. Baskin II, Halberstam NM, Mukhina TV, et al. The learned symmetry
for Probabilistic Modeling and Inference. arXiv Preprint. 2019; concept in revealing quantitative structure-activity relationships with
arXiv:191202762. artificial neural networks. SAR QSAR Environ Res. 2001;12(4):401–416.
88. Madhawa K, Ishiguro K, Nakago K, et al. An invertible flow model for 106. Ertl P, Schuffenhauer A. Estimation of synthetic accessibility score
generating molecular graphs. arXiv Preprint. 2019;arXiv:190511600. of drug-like molecules based on molecular complexity and frag-
89. Liu J, Kumar A, Ba J, et al. Graph normalizing flows. Advances in ment contributions. J Cheminf. 2009;1(1):8.
Neural Information Processing Systems 32 (NIPS 2019). Curran 107. Coley CW, Rogers L, Green WH, et al. SCScore: synthetic complexity
Associates, Inc; 2019, p. 13578–13588. learned from a reaction corpus. J Chem Inf Mod. 2018;58(2):252–261.
90. Shi C, Xu M, Zhu Z, et al. GraphAF: a flow-based autoregressive model 108. Baskin II, Madzhidov TI, Antipin IS, et al. Artificial intelligence in
for molecular graph generation. arXiv Preprint. 2020;arXiv:200109382. synthetic chemistry: achievements and prospects. Russ Chem Rev.
91. Bjerrum E, Sattarov B. Improving chemical autoencoder latent space and 2017;86(11):1127–1156. .
molecular de novo generation diversity with heteroencoders. Biomolecules. • A comprehensive review on the use of artificial intelligence in
2018;8(4):131. synthetic chemistry
92. Kusner MJ, Paige B, Hern M, et al. Grammar variational 109. Liu B, Ramsundar B, Kawthekar P, et al. Retrosynthetic reaction
autoencoder. Proceedings of the 34th International Conference prediction using neural sequence-to-sequence models. ACS Cent
on Machine Learning - Volume 70; Sydney, NSW, Australia. JMLR. Sci. 2017;3(10):1103–1113. .
org; 2017. p. 1945–1954. 110. Segler MHS, Preuss M, Waller MP. Planning chemical syntheses with
93. Noel OB, Andrew D DeepSMILES: an adaptation of smiles for use in deep neural networks and symbolic AI. Nature. 2018;555:604. .
machine-learning of chemical structures; 2018. • An important publication on the use of deep learning in plan-
94. Krenn M, Häse F, Nigam A, et al. SELFIES: a robust representation of ning organic synthesis
semantically constrained graphs with an example application in 111. Coley CW, Green WH, Jensen KF. Machine learning in computer-aided
chemistry. ArXiv Preprint. 2019;arXiv:190513741. synthesis planning. Acc Chem Res. 2018;51(5):1281–1289.
95. Nigam A, Friederich P, Krenn M, et al. Augmenting genetic algo- 112. Button A, Merk D, Hiss JA, et al. Automated de novo molecular
rithms with deep neural networks for exploring the chemical space. design by hybrid machine intelligence and rule-driven chemical
arXiv Preprint. 2019;arXiv:190911655. synthesis. Nat Mach Intelligen. 2019;1(7): 307–315. .
96. Brown N, Fiscato M, Segler MHS, et al. GuacaMol: benchmarking models • An important publication on combining de novo molecular
for de novo molecular design. J Chem Inf Mod. 2019;59(3):1096–1108. design with synthesis planning

You might also like