ứng dụng de novo trong điều trị SAR-CoV-2

Subscriber access provided by Bibliothèque de l'Université Paris-Sud
Machine Learning and Deep Learning

De novo drug design of targeted chemical libraries based on
artificial intelligence and pair based multi-objective optimization
Domenico Alberga, Nicola Gambacorta, Daniela Trisciuzzi,
Fulvio Ciriaco, Nicola Amoroso, and Orazio Nicolotti
J. Chem. Inf. Model., Just Accepted Manuscript • DOI: 10.1021/acs.jcim.0c00517 • Publication Date (Web): 26 Aug 2020
Downloaded from pubs.acs.org on August 29, 2020
Just Accepted
“Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted
online prior to technical editing, formatting for publication and author proofing. The American Chemical
Society provides “Just Accepted” as a service to the research community to expedite the dissemination
of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in
full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully
peer reviewed, but should not be considered the official version of record. They are citable by the
Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore,
the “Just Accepted” Web site may not include all articles that will be published in the journal. After
a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web
site and published as an ASAP article. Note that technical editing may introduce minor changes
to the manuscript text and/or graphics which could affect content, and all legal disclaimers and
ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or
consequences arising from the use of information contained in these “Just Accepted” manuscripts.
is published by the American Chemical Society. 1155 Sixteenth Street N.W.,

Washington, DC 20036
Published by American Chemical Society. Copyright © American Chemical Society.
However, no copyright claim is made to original U.S. Government works, or works
produced by employees of any Commonwealth realm Crown government in the
course of their duties.
Page 1 of 34 Journal of Chemical Information and Modeling
1
2
3
4
5 De novo drug design of targeted chemical libraries based on
6
7 artificial intelligence and pair based multi-objective
8
9 optimization
10
11
12 Domenico Alberga1‡, Nicola Gambacorta1‡, Daniela Trisciuzzi1,2, Fulvio Ciriaco3, Nicola
13
14 Amoroso1 and Orazio Nicolotti1*
15
16
1 Dipartimento di Farmacia-Scienze del Farmaco, Università degli Studi di Bari “Aldo
17
18
Moro”, Via E. Orabona, 4, I-70126 Bari, Italy
19
20
21 2 Molecular Horizon srl, Via Montelino 32, 06084 Bettona, Italy
22
23
3 Dipartimento di Chimica, Università degli Studi di Bari “Aldo Moro”, Via E. Orabona, 4, I-
24
25
26 70126 Bari, Italy
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59 ‡ These authors contributed equally.
60 ** Author to whom correspondence should be addressed; e-mail: orazio.nicolotti@uniba.it;
telephone: +39-080-5442551; fax: +39-080-5442230.
ACS Paragon Plus Environment

Journal of Chemical Information and Modeling Page 2 of 34
1
2
3 Abstract
4
5
6 Artificial intelligence and multi-objective optimization represent promising solutions to
7
8 bridge chemical and biological landscape by addressing the automated de novo design of
9
compounds as a result of a human-like creative process. In the present study, we
10
11 conceived a novel pair based multi-objective approach implemented in an adapted
12
13 SMILES generative algorithm based on Recurrent Neural Networks for the automated de
14
15 novo design of new molecules whose overall features are optimized by finding the best
16 trade-offs among relevant physicochemical properties (MW, logP, HBA, HBD) and
17
18 additional similarity-based constraints biasing specific biological targets. In this respect, we
19
20 carried out the de novo design of chemical libraries targeting Neuraminidase,
21
22
Acetylcholinesterase and the main protease of Severe Acute Respiratory Syndrome
23 Coronavirus 2. Several quality metrics were employed to assess drug-likeness, chemical
24
25 feasibility, diversity content and validity. Molecular docking was finally carried out to better
26
27 evaluate the scoring and posing of the de novo generated molecules with respect to X-ray
28
cognate ligands of the corresponding molecular counterparts. Our results indicate that
29
30 artificial intelligence and multi-objective optimization allow to capture the latent links joining
31
32 chemical and biological aspects, thus providing easy-to-use options for customizable
33
34 design strategies, which are especially effective for both lead generation and lead
35 optimization. The algorithm is freely downloadable at https://github.com/alberdom88/moo-
36
37 denovo and all the data are available as Supporting Information.
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60

1
2
3 Introduction
4
5
Despite the recent progresses in high throughput screening,1 the chemical space is still
6
7 widely unexplored, in particular for prospective drug design.2 Importantly, the chemical
8
9 universe is estimated to encompass by far in excess more than 1060 molecules even
10
11 limiting its exploration only to drug-like space.3 Moreover, since the drug discovery process
12 is typically demanding, slow and expensive,4 academic and industrial researchers are
13
14 discouraged to bet on completely novel chemotypes5,6 by exploring a potentially awarding
15
16 off-patent chemical space, thus preferring to make me-too decorations of well-known
17
18
molecular bioactive structures biasing specific biological targets.7 Among others, a reason
19 behind the low rate of success of finding structurally new interesting drug-like molecules is
20
21 their inherently multi-objective nature.8,9 In particular, drug-like candidates should match a
22
23 large number of often conflicting features such as water solubility, logP, molecular weight,
24 hydrogen bond donor/acceptor groups, toxic alerts.10 Moreover, new conceived drug-like
25
26 molecules should be easy to prepare by chemical synthesis and, last but not least,
27
28 biologically active.11
29
30
31
Before the onset of computer aided molecular design, drug discovery was mostly
32 addressed by knowledge-based human intuition.12 A new era in this field was born with the
33
34 advent of the artificially intelligence methods and the recent wide-spread of deep learning
35
36 techniques, which are capable not only to uncover hidden patterns from large amounts of
37
data, thus enabling the creation of highly predictive structure-activity models,13–15 but can
38
39 serve as automated generative algorithms to design new compounds with desired
40
41 properties and selective towards specific biological targets.16 As a matter of fact, there has
42
43 been a mushrooming growth of novel de novo drug design machine learning algorithms
44 based on diverse techniques. A few examples are the variational autoencoders,17–19 the
45
46 generative adversarial networks20–22 and the recurrent neural networks.23–26 In general,
47
48 these algorithms are trained in two steps. The first step employs large high-quality
49
50
molecular databases (such as ChEMBL27 and ZINC28) to allow models to infer learning
51 rules concerning with chemical representation, usually SMILES23,26 or molecular graph,29,30
52
53 for the automated generation of novel molecules. In particular, the algorithm learns how
54
55 creating novel chemically valid molecules based on the probability of the next character in
56
SMILES sequences or the node and edge distribution in molecular graphs. As a second
57
58 step, reinforcement learning methods31 are used to speed-up the exploration of the
59
60 chemical space driving the generation of new samples towards unexplored regions with
desired chemical, physical or structural properties.5 In some cases, these algorithms

1
2
3 proved to generate drug-like molecules with promising activity towards a protein target.16
4
5 For instance, Merk et al. exploited recurrent neural network-based methods to design
6
7 novel retinoid X and peroxisome proliferator-activated receptor agonists;25,32 Polykovskiy
8
et al. through entangled conditional adversarial autoencoder discovered novel Janus
9
10 kinase 3 inhibitors;33 Zhavoronkov et al. via generative tensorial reinforcement learning
11
12 discovered potent inhibitors of discoidin domain receptor 1.34 However, published methods
13
14 are generally focused on the automated generation of novel compounds through single
15 objective or weighted sum optimization functions.29 In the present work, we employed a
16
17 multi-objective optimization algorithm, which is effective in those real-life problems
18
19 involving the simultaneous optimization of two or more objectives. Notably, multi-objective
20
21
optimization methods do not require any a priori calibration and are able to detect a family
22 of equivalent solutions per run instead than a single solution at a time. As shown in Figure
23
24 1, each equivalent solution falls on the Pareto frontier and is non-dominated because
25
26 another solution does not exist that is better considering all the objectives to optimize.
27
28
Only a few applications of multi-objective methods in conjunction with reinforcement
29
30 learning have been so far described for the de novo design. 35–37 In particular, we herein
31
32 propose a novel multi-objective optimization approach capable to generate de novo
33
34 targeted chemical libraries whose compounds represent ideal non-dominated solutions for
35 a range of simultaneously pair based optimized features. In particular, we adapted the
36
37 REINVENT code proposed by Olivecrona et al.23 in order to enable the pair based multi-
38
39 objective optimization of several molecular features based on Pareto dominance.38 By
40
41
employing a tailored fitness function, our reinforcement learning based method is able to
42 drive the automated generation of pair based non-dominated solutions representing
43
44 chemical structures whose molecular features are customized towards specific biological
45
46 targets and are constrained in drug-like ranges set by the user. The herein new proposed
47 method was thus tested to accomplish the de novo generation of targeted chemical
48
49 libraries.39 The code is freely available at GitHub.40 Performances were assessed
50
51 employing quality metrics specifically suitable for evaluating de novo designed
52
53
compounds.41,42 Finally, our approach was applied to three real-life case studies relative to
54 the de novo drug design of new therapeutically relevant enzymatic inhibitors targeting
55
56 neuraminidase (NA), acetylcholinesterase (AChE) and the main protease of Severe Acute
57
58 Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), the latter responsible of the COVID-
59
19 global health emergency.43,44 We believe that our method could be of inspiration to
60
medicinal chemists, especially in early stages of the drug discovery process, by generating

1
2
3 new patentable chemotypes provided with a wider spectrum of desirable physicochemical
4
5 and biological properties.
6
7
8 Methods
9
10
The SMILES generative algorithm proposed in this work is built by adapting a method
11
12 originally developed by Olivecrona et al. (REINVENT)23, which can be briefly summarized
13
14 as follows. The model consists in a Recurrent Neural Network (RNN) composed of three
15
16 layers with 512 Gated Recurrent Units45 in each layer. The SMILES is tokenized at each
17 single character with the exceptions of Cl, Br and chars included in square brackets that
18
19 are considered as one token. The RNN is trained by maximum likelihood estimation of the
20
21 next token in the generated sequence given the prefix of the previous steps. The next
22
23
character is sampled from the predicted probability distribution with the aim to maximize
24 the likelihood assigned to the correct token. The RNN was trained on a subset of the
25
26 ChEMBL22 database27 built selecting molecules containing between 10 and 50 heavy
27
28 atoms and elements H, C, N, O, F, S, Cl, Br for a total of ~1.2 million structures
29
canonicalized with the RDKit package.46 The generation of new molecules with specific
30
31 optimized properties is tackled through a policy-based Reinforcement Learning (RL)
32
33 algorithm.47 Briefly, the probability distributions previously learnt by the trained RNN are
34
35 used as the initial prior policy. The RL procedure is able to modify the prior policy in order
36 to generate SMILES that maximize a given fitness function S. More details can be found in
37
38 the original REINVENT paper.23
39
40
41 The method proposed here exploits the REINVENT algorithm and implements a two-term
42
43
fitness function S. The first term R plays the role of a drug-like filter to reward molecules
44 within specific property ranges set by the user while increasing penalties are assigned
45
46 when moving out of the ranges. The second term P employs a pair based multi-objective
47
48 method. Unlike the classical multi-objective approach considering all the objectives at
49 once, the pair based multi-objective optimization is applied to all the possible pairs of
50
51 features in order to return a more discriminant overall ranking. This can help to better
52
53 assess the quality of the de novo generated molecules instead of having a large number of
54
55
equivalent ranked solutions whose amount raises greatly when the number of objectives
56 increases.
57
58
59
60 In our approach, let x=(x1, x2,…, xN) be the vector of the N molecular features to optimize.
Let M be the number of molecules of the final targeted chemical library generated by de

1
2
3 novo design. With the aim to train the algorithm to result targeted chemical libraries with N
4
5 optimized molecular features, the S(x) fitness function, for each molecule in the targeted
6
7 chemical library, is defined as follows:
8
9
10 𝑆(𝑥) = 𝑅(𝑥) + 𝑃(𝑥)
11
12
13 𝑁
14
15 𝑅(𝑥) = ∑𝑟(𝑥 ) 𝑖
16 𝑖=1
17
{
2
18
19 𝑒
― ( 𝑥𝑖 ― 𝑚𝑖𝑛𝑖
∆𝑖 ) 𝑖𝑓𝑥 < 𝑚𝑖𝑛
𝑖 𝑖
20
21
𝑟(𝑥𝑖) = 1𝑖𝑓𝑚𝑖𝑛𝑖 ≤ 𝑥𝑖 ≤ 𝑚𝑎𝑥𝑖
2
22
23 𝑒
― ( 𝑥𝑖 ― 𝑚𝑎𝑥𝑖
∆𝑖 ) 𝑖𝑓𝑥 > 𝑚𝑎𝑥
𝑖 𝑖
24
25
26
27
28
where mini and maxi are the minimum and maximum acceptable values for a given feature
29 i and ∆𝑖 = (maxi ― mini)/4 if 𝑚𝑎𝑥𝑖 ≠ 𝑚𝑖𝑛𝑖, ∆𝑖 = 1 otherwise.
30
31
32
{
33 0𝑖𝑓𝑅(𝑥) < 𝑁
34 1 𝑚 ― 𝑑𝑖𝑗
35
𝑃(𝑥) =
𝐶(𝑁,2) 𝑚 ∑
𝑖𝑓𝑅(𝑥) = 𝑁
36 𝑖,𝑗 ∈ 𝐶
37
38
39
40 where dij is the Pareto dominance of the molecule with respect to the possible pairs of
41 sampled features i and j of the generated targeted chemical library (e. g., the number of
42
43 molecules of the targeted chemical library dominating a given compound by considering
44
45 pairs of features i and j); C is the set of all the possible pairs of features under multi-
46
objective optimization that is C(N,2) = N(N-1)/2; m is the number of molecules for which 𝑟
47
48 (𝑥𝑖)=1 for all the desired features. Based on these assumptions, the S(x) fitness function
49
50 returns values in the range [0, N+1]. In the ideal case, that is S=N+1, a de novo generated
51
52 molecule reflects all the considered molecular features in the desired range being all the
53
54
possible pairs of its optimized features located on the Pareto frontier as optimal non-
55 dominated solutions. Note that the direction of the Pareto frontier is defined by the choice
56
57 to maximize or minimize the objectives i and j within a given desired range.
58
59 For the sake of completeness, the overall dominance with respect to all the objectives at
60
once has been also calculated and the corresponding ranks are reported as Supporting

1
2
3 Information (see files NA.lib1.csv, NA.lib2.csv, AChE.lib1.csv, AChE.lib2.csv and SARS-
4
5 CoV-2.lib.csv). As shown, overall non-dominated solutions accumulate in the early 5% of
6
7 the top-scored de novo generated molecules.
8
The algorithm is included in a graphical user interface (GUI), written with pyqt5, freely
9
10 downloadable at https://github.com/alberdom88/moo-denovo. The GUI is able to generate
11
12 targeted chemical libraries with up to six optimized molecular features that can be selected
13
14 from a collection of 203 options whose list is available as Supporting Information (File S1).
15 Among these, 201 are molecular descriptors calculated via RDKit46 and Moses41
16
17 packages. In addition, two further features were also included. The first accounts for the
18
19 presence of a user-defined fragment inside the generated molecules. The second is the
20
21
Tanimoto similarity48 calculated using the Morgan fingerprints with radius equal to 2.49
22 As recently reported,41,42 the goodness of the targeted chemical libraries is assessed by
23
24 calculating the following quality metrics: a) validity, which represents the fraction of
25
26 chemically valid SMILES; b) unicity, which stands for the fraction of unique generated
27
SMILES; c) Internal Diversity (IntDiv)50, which accounts for the overall molecular diversity
28
29 of the targeted chemical library; d) filters, which reflect the fraction of de novo generated
30
31 molecules devoid of medicinal chemistry filters (MCFs)51,52 and of pan-assay interference
32
33 compounds (PAINS) alerts53 and e) Synthetic Accessibility (SA) score, which provides a
34 heuristic estimate of how hard (SA=10) or how easy (SA=1) is the chemical synthesis of a
35
36 given molecule:41 SA is averaged on the entire targeted chemical library.
37
38
39 The herein proposed de novo drug design algorithm was thus challenged by generating
40
41
targeted chemical libraries biased for binding NA, AChE and SARS-CoV-2 main protease.
42 The pair based multi-objective optimization algorithm progressed through 3000 cycles, by
43
44 generating 500 compounds per iteration. Finally, the targeted chemical libraries were built
45
46 sampling 10000 potential inhibitors from the policy that maximizes the average of S(x)
47 fitness values (that is <S(x)>) and were evaluated according to the quality metrics above
48
49 described. Furthermore, using the Schrödinger 2019-4 suite,54 molecular docking
50
51 simulations were performed to further evaluate the goodness of new generated inhibitors
52
53
towards the corresponding desired biological targets. In this respect, the X-ray solved
54 crystal structures of NA, AChE and SARS-CoV-2 protease were retrieved from the Protein
55
56 Data Bank (PDB) with the entry identifiers equal to 3B7E,55 4EY756 and 6LU7,57
57
58 respectively. The Protein Preparation Wizard was thus used to revise the X-ray structures,
59
eliminating the water molecules, correcting the protonation states and carrying out energy
60
minimization. All the compounds comprised in the de novo targeted chemical libraries

1
2
3 were thus prepared for docking simulations by employing the LigPrep tool54 and setting no
4
5 more than eight stereoisomers per molecule to minimize structural complexity as well as
6
7 avoid too expensive computational costs. Tautomers and ionization states were kept
8
unchanged. All the dockings were performed employing Glide54 standard precision with
9
10 default settings by automatically centering the grid boxes on the co-crystallized cognate
11
12 ligands of the three reference proteins. The reliability of docking simulation protocols was
13
14 preliminary challenged by computing the root mean square deviation (RMSD) values (see
15 Figure S1 of Supporting Information). The docking results were analyzed comparing
16
17 posing and scoring of the co-crystallized cognate ligands with those experienced from the
18
19 new generated molecules. In this respect, three terms were mostly considered: the
20
21
docking scores, the chance of interacting with key binding site residues and the S(x)
22 fitness values.
23
24
25 Results
26
27
28 Our automated algorithm for de novo design was challenged by generating targeted
29
chemical libraries39 likely to bind NA, AChE, and novel SARS-CoV-2 main protease. More
30
31 specifically two targeted chemical libraries were generated for NA (i.e., NA.lib1 and
32
33 NA.lib2) and for AChE (i.e., AChE.lib1 and AChE.lib2) and one targeted chemical library
34
35 (i.e., SARS-CoV-2.lib) was designed for SARS-CoV-2 main protease. Each targeted
36 chemical library was obtained by pair based multi-objective optimization of several relevant
37
38 physicochemical properties and, eventually, by considering other medicinal chemistry
39
40 inspired constraints such as molecular similarity. As far as the three case studies are
41
42
concerned, the learning curves indicating the progress of the average of the S(x) fitness
43 values is shown in Figure 2 while a synoptic assessment of de novo designed targeted
44
45 chemical libraries is provided in Table 1 by reporting the calculated quality metrics.
46
47
48 Neuraminidase case study
49
50 NAs are glycoside hydrolase enzymes with a fundamental role in the spread of the virus,
51
52 especially in the late stages of infection.55 They are classified as exosialidases and are
53
54 capable of cleaving glycosidic bonds between sialic acid and sugar.55 The virus employs
55
56
this mechanism to detach itself from the host cell after the infection and, thus, to spread
57 out. NA inhibitors are a well-known class of drugs used against influenza A virus,58 and the
58
59 discovery of new potent biologically active agents is considerably interesting from a
60
pharmaceutical perspective. To this purpose, two targeted chemical libraries of 10000

1
2
3 potential inhibitors each were designed. The first targeted chemical library (NA.lib1 is
4
5 included in Supporting Information as File S2) was designed based on the wealth of
6
7 physicochemical information taken from a benchmark pool comprising all the entries (that
8
9
are 218) available from ChEMBL v.25 provided with IC50<1M towards NA (target
10 referenced as ChEMBL2051). The de novo design of NA.lib1 was addressed by
11
12 considering the variation within the benchmark pool of four easy to interpret molecular
13
14 descriptors that are the molecular weight (MW), the logP, the number of hydrogen bond
15
donor (HBD) atoms and the number of hydrogen bond acceptor (HBA) atoms. In our
16
17 approach the de novo designed compounds are optimized in a range defined within one
18
19 standard deviation around the mean value of the selected features computed for the
20
21 benchmark pool as shown in Table S1 of the Supporting Information. On the other hand,
22 the de novo design was addressed to generate compounds including one and only one
23
24 aliphatic ring like Zanamvir.
25
26
27 As shown in Figure 2, the algorithm reaches the convergence after approximately 500
28
29
iterations. The point corresponding to the maximum value of <S(x)> average fitness value
30 (blue line of Figure 2) is thus used as a generative model to create NA.lib1 consisting of
31
32 10000 potential inhibitors. As reported in Table 1, the 99.5% of the generated molecules
33
34 are valid and 98.6% are unique. Importantly, only the 62.0% pass typical structural alerts
35
filters and this should alert users when prioritizing de novo designed compounds for further
36
37 testing. Finally, the generated targeted chemical library NA.lib1 owns good internal
38
39 diversity equal to 73.3% and a fair synthetic accessibility score (2.5220.361). Figure 3
40
41 depicts the distributions of the molecular descriptors selected for pair based multi-objective
42
optimization of NA.lib1 on the left-hand side and its average pair based Pareto dominance
43
44 on the right-hand side. In particular, the 93.4% of the molecules in the library owns
45
46 descriptors in the desired range. Additional details are reported in Figure S2 of Supporting
47
48 Information.
49
50 The second targeted chemical library (NA.lib2 is included in Supporting Information as File
51
52 S3) was instead designed to generate molecules whose structures and physicochemical
53
54 properties are constrained to those of zanamivir. To this end, the de novo design
55
56
progressed by maximizing the Tanimoto molecular similarity to the zanamivir with a cutoff
57 >0.3 and allowing deviations shown in Table S2 of Supporting Information as far as MW
58
59 and logP are concerned. As above discussed, NA.lib2 included 10000 molecules created
60
from the generative model taken from the maximum value of the average fitness value

1
2
3 <S(x)> (red line of Figure 2). As reported in Table 1, 99.8% of the generated molecules are
4
5 valid and 95.7% are unique. As expected, a drop of the internal diversity, that is now equal
6
7 to 66.9%, was observed with respect to the NA.lib1 but the molecules passing the filters
8
9
increases to 85.2%. The library shows an average synthetic accessibility of 4.7000.233, a
10 value complying that computed for zanamivir (that is 4.287). A comprehensive view of
11
12 feature distribution is shown in Figure 3. Additional details are reported in Figure S2 and
13
14 Figure S3 of Supporting Information.
15
16
Representative examples taken from NA.lib1 and NA.lib2 are shown in Table 2. The de
17
18 novo drug design strategy employed to generate NA.lib1 allowed to build new chemotypes
19
20 including piperidine, piperazine and phenol rings. The de novo drug design strategy
21
22 employed to generate NA.lib2 resulted in the scaffold hopping of the dihydropyran ring of
23 zanamivir, replaced for instance by tetrahydropyridine and cyclohexene cores. All these
24
25 engineered structures are provided with polar substituents, such as aminic, amidic,
26
27 sulfamoyl, guanidinium, hydroxyl or carboxyl functional groups, potentially capable to
28
29
reproduce the molecular interactions accomplished by zanamivir. For the sake of
30 interpretation, molecular docking analyses were thus carried out to gain insights about
31
32 molecular interactions established by de novo designed compounds at the binding site of
33
34 NA enzyme compared to those observed for zanamivir whose observed docking score of -
35
7.002 kcal/mol is mainly due to the number of hydrogen bond (HB) interactions engaged
36
37 with R371, W178 and R152 at the NA binding site. In this respect, we reported two
38
39 applicative examples taken from NA.lib1 and NA.lib2, respectively. NA.lib1_02 de novo
40
41 designed compound returned a docking score equal to -6.920 kcal/mol being its sulfamoyl
42 and aminic groups involved in HBs with the side chains of R118, E277, R292, and R371
43
44 as well as with the backbone of W178 (see Figure 4a). NA.lib2_01 returned a docking
45
46 score equal -7.892 kcal/mol with the guanidinium group forming HBs with the side chains
47
48
of E276 and E277, the amidic group interacting with side chains of R371 and R118, and
49 the acetoamide substituents making HBs with R152 and D151 (see Figure 4b). Worthy of
50
51 mention, both NA.lib1_02 and NA.lib2_01 showed a posing and a scoring comparable to
52
53 zanamivir by experiencing very similar molecular interactions at the NA binding site.
54
55 Acetylcholinesterase case study
56
57
58 AChE is an enzyme located in the post-synaptic membrane of cholinergic neurons and
59
60 catalyzes the hydrolysis reaction of the acetylcholine in choline and acetic acid.
Excessively decreased levels of acetylcholine are hallmarks of the Alzheimer onset.59 This

1
2
3 is the main reason why AChE inhibition is a widely studied mechanism for the symptomatic
4
5 palliation of neurodegenerative diseases.60 Donepezil, a selective AChE dual binding site
6
7 inhibitor,61 was used as reference for the automated generation of a targeted chemical
8
library.
9
10
11 Again, two different strategies were adopted and, thus, two targeted chemical libraries of
12
13 10000 potential inhibitors each were designed. The first targeted chemical library
14
15 (AChE.lib1 is included in Supporting Information as File S4) was designed based on five
16 easy to interpret molecular descriptors that are MW, logP, HBD, HBA and the number of
17
18 rings (nR). In particular the nR descriptor was selected considering the presence of four
19
20 rings (two of which fused) in the structure of donepezil. The minimum and maximum
21
22
values of these descriptors were selected retrieving all the entries (that are 1800) available
23 from ChEMBL v.25 provided with IC50<1M towards AChE (referenced as CHEMBL220).
24
25 The de novo designed compounds are optimized in a range defined within one standard
26
27 deviation around the mean value of the selected features computed for the benchmark
28
29
pool as shown in Table S3 of the Supporting Information. As shown in Figure 2, the
30 algorithm reaches the convergence after approximately 500 iterations. The point
31
32 corresponding to the maximum value of <S(x)> average fitness value (yellow line of Figure
33
34 2) is thus used as a generative model to create AChE.lib1 consisting of 10000 potential
35
inhibitors. As reported in Table 1, 99.3% of the generated molecules are valid and 94.5%
36
37 are unique. Importantly, 92.0% pass typical structural alerts filters. Interestingly, the
38
39 generated targeted chemical library AChE.lib1 owns good internal diversity equal to 71.5%
40
41 and a fair synthetic accessibility score (1.9420.192). Figure 5 depicts the distributions of
42
the molecular descriptors selected for pair based multi-objective optimization of AChE.lib1
43
44 on the left-hand side and its average Pareto dominance on the right-hand side. In
45
46 particular, 96.1% of the molecules in the library own descriptors in the desired range.
47
48 Additional details are reported in Figure S2 of Supporting Information.
49
50 The second targeted chemical library (AChE.lib2 is included in Supporting Information as
51
52 File S5) was instead designed to generate molecules whose structures and
53
54 physicochemical properties are constrained to those of donepezil. To this end, the de novo
55
56
design progressed by maximizing the Tanimoto molecular similarity to the donepezil with a
57 cutoff >0.3 and allowing deviations shown in Table S4 of Supporting Information as far as
58
59 MW and logP are concerned. As above discussed, AChE.lib2 included 10000 molecules
60
created from the generative model taken from the maximum <S(x)> average fitness value

1
2
3 (green line of Figure 2). As reported in Table 1, 99.8% of the generated molecules are
4
5 valid and 95.9% are unique. A drop of the internal diversity, that is now equal to 62.5%, as
6
7 well as of the molecules passing the filters, that is now equal to 84.1%, was observed with
8
9
respect to AChE.lib1. The library shows an average synthetic accessibility of 2.1650.256
10 complying that computed for donepezil (that is 2.682). A comprehensive view is shown in
11
12 Figure 5. Additional details are reported in Figure S2 of Supporting Information.
13
14
15 Representative examples taken from AChE.lib1 and AChE.lib2 are shown in Table 2. As
16
far as AChE.lib1 is concerned, the algorithm generated potential inhibitors with no less
17
18 than three slightly decorated aromatic rings joined by proper length ether or keto bridge to
19
20 ensure the sampling of the catalytic (CAS) and peripheral anionic (PAS) site of AChE.62
21
22 On the other hand, AChE.lib2 comprised compounds including the phenylpiperidine
23 scaffold as Donepezil.
24
25
26 Hence, molecular docking simulations have been employed to inspect the molecular
27
28 interactions of the de novo generated potential AChE inhibitors. As depicted in the left-
29
30 hand side of Figure 6, AChE.lib1_02 molecule, taken from AChE.lib1, can experience 
31 interactions with W286 at PAS and with F338 and can make HB with the backbone of
32
33 F295. On the right-hand side of Figure 6, AChE.lib2_04 molecule, taken from AChE.lib2,
34
35 can engage  interactions with W286 at PAS through the 3-methoxyphenyl arm and with
36
37 W86 at CAS through the benzylpiperidine moiety and HB with F295. Moreover, the
38
protonated nitrogen atom of the piperidine ring is able to engage a cation- interaction with
39
40 W286. Interestingly, these interactions are also visited by donepezil showing a docking
41
42 score value equal to -12.640 kcal/mol. AChE.lib1_02 and AChE.lib2_04 returned docking
43
44 score values equal to -11.045 kcal/mol and -12.703 kcal/mol, respectively.
45
46 SARS-CoV-2 main protease case study
47
48
49 The dramatic spread of Covid-19 due to infective agent SARS-CoV-2 (Severe Acute
50
51 Respiratory Syndrome Coronavirus 2) has undoubtedly led researchers from all over the
52
53
world to face this new emergency with every possible resource.43,44 On February 2020, the
54 first X-ray solved structure of the protease of the new coronavirus has been released in the
55
56 PDB57 and represents an extremely important milestone for developing new and effective
57
58 drug therapies. After transcription, the viral mRNA penetrates the cytoplasm and uses the
59
60
cellular mechanism for the proteins production. The newly formed polypeptide chains are
cleaved into smaller fragments and used by the virus for its maturation.63 These cleavages

1
2
3 are carried out by proteases. Specifically, this is classified as cysteine-histidine protease
4
5 (H41-C145).64 As far as this case study is concerned, there is no entry available in the
6
7 ChEMBL repository since the target is so far still unknown. On this premise, a similarity
8
based de novo design strategy was thus employed. As a reference for the similarity de
9
10 novo design, the co-crystallized ligand (reported as N3 in the PDB entry coded as 6LU7)
11
12 covalently bonded to the SARS-CoV-2 main protease was selected. As show in Table S5
13
14 of the Supporting Information, a targeted chemical library (SARS-CoV-2.lib is included in
15 Supporting Information as File S6) was generated by pair based multi-objective
16
17 optimization of the following features: the Tanimoto molecular similarity, MW and logP
18
19 ranging in the intervals shown in Table S5 of Supporting Information. As far as this case
20
21
study is concerned, the algorithm converged slowly after 1000 iterations (black line of
22 Figure 2). This is likely due to the higher structural complexity of N3. Again, 10000
23
24 molecules were generated using the generative model corresponding to the maximum of
25
26 <S(x)> average fitness value. As shown in Table 1, SARS-CoV-2.lib shows good values of
27
quality metrics being validity equal to 99.9%, unicity equal to 91.0% and ability to pass
28
29 structural alert filters equal to 77.3%. The internal diversity (that is 53.9%) is however
30
31 lower compared to the cases of studies previously discussed. Again, this can be explained
32
33 with the large structural complexity of N3 that forces the algorithm to generate molecules
34 to some extent provided with a reduced structural variability. The library shows an average
35
36 synthetic accessibility of 3.6450.281 that is however slightly lower compared to that
37
38 computed for N3 (that is 4.701). Figure 7 depicts the distributions of the molecular
39
40 descriptors selected for pair based multi-objective optimization of SARS-CoV-2.lib on the
41 left-hand side and its average Pareto dominance on the right-hand side.
42
43
44 As far as SARS-CoV-2.lib is concerned, the generation of new molecules has been driven
45
46 by employing a similarity-based strategy. For ease of discussion, four potential inhibitors
47
48
built by automated de novo design were reported as examples in Table 2. The algorithm
49 was able to generate peptide-like molecules, whose backbone included at least three
50
51 peptide bonds and whose side chains explored different combinations of residues such as
52
53 glycine, glutamate, aspartate, glutamine or leucine.
54
55 Molecular docking studies were thus finally performed on SARS-CoV-2.lib by using as a
56
57 biological target the SARS-CoV-2 main protease (PDB entry 6LU7, the only 3D crystal
58
59 structure available at the time of writing). In this particular case study, a protocol of
60
covalent docking could have been an option except that it is very time-consuming and thus

1
2
3 not suitable for screening large numbers of ligands. As shown in Figure 8, the de novo
4
5 generated peptide backbone is essential for engaging key HB interactions with E166 and
6
7 Q189 that actually were also observed in the N3 co-crystallized inhibitor provided with a
8
docking score value equal to -11.213 kcal/mol. Interestingly, SARS-CoV-2.lib_03 was also
9
10 able to form additional HB with C145, a key residue for the catalytic function of the SARS-
11
12 CoV-2 main protease, thus returning a docking score value of -9.046 kcal/mol.
13
14
15 Final remarks
16
17 Based on the obtained results, we can conclude that artificial intelligence and multi-
18
19 objective optimization provided a transparent framework for customizable design
20
21 strategies.65,66 Our approach demonstrated to be particularly suited for both lead
22
23
generation and lead optimization phases. In this respect, lead generation could mostly be
24 pursued based on a merely data driven approach optimizing physicochemical properties
25
26 such as MW, logP, the number of HBA and HBD. In this regard, we observed that de novo
27
28 generated compounds (i.e. NA.lib1 and AChE.lib1) can sample new chemotypes exploring
29
a broader and hopefully off-patent chemical space compared to a given reference domain.
30
31 This approach is indeed more challenging and helpful to fuel new ideas for a target
32
33 oriented design. On the other hand, lead optimization is instead mostly addressed by
34
35 adding further constraints, such as molecular similarity thresholds or the inclusion of
36 particular privileged scaffolds, which normally reflect specific user-dependent options (i.e.
37
38 NA.lib2, AChE.lib2 and SARS-CoV-2.lib). This approach is indeed more conservative but
39
40 likely of more practical use and, to some extent, less prone to late stage failures. These
41
42
considerations were also supported by a ligand-based drug target prediction exercise
43 carried out by employing the Multi-fingerprint Similarity Search algorithm (MuSSel)67,68
44
45 which is available as a free web platform at http://mussel.uniba.it:5000/prediction.html. For
46
47 instance, considering the top-5 targets predicted by MuSSel for each query compound, we
48 observed that the Neuroaminidase (referenced as ChEMBL2051) and
49
50 Acetylcholinesterase (referenced as ChEMBL220) were predicted as the protein targets for
51
52 about 95% and about 68% compounds of NA.lib2 and AChE.lib2, respectively. On the
53
54
other hand, we observed that Monoamine oxidase B (referenced as ChEMBL2039) and
55 Monoamine oxidase A (referenced as ChEMBL1951) were predicted as the protein targets
56
57 for about 31% and about 16% compounds of AChE.lib1, according to their potential action
58
59 as multi-target therapeutics for neurodegenerative disorders.60 As far as NA.lib1 is
60
concerned, it was instead difficult to derive causative relationships with the predicted

1
2
3 protein targets. A synopsis of all the gathered drug target predictions is shown in Table S6
4
5 of Supporting Information. The overall quality of the de novo designed target libraries is
6
7 however ensured by the quality metrics indicating that the new generated compounds
8
were always in optimal ranges as far as the correctness of SMILES notations and the
9
10 occurrence of duplicates, the level of internal dissimilarity, the compliance with drug-
11
12 likeness filters and the chemical feasibility is concerned. Furthermore, molecular docking
13
14 was employed to better assess the biological potential of the new generated compounds
15 by explicitly comparing their posing and scoring with those experimentally observed for
16
17 cognate ligands co-crystallized at the binding sites of target proteins. This analysis
18
19 enabled to appreciate that the de novo designed targeted chemical libraries contain
20
21
molecules experiencing similar binding modes as they engaged specific interactions with
22 key target protein residues. Overall, the goodness of molecular docking as a retrospective
23
24 validation option is also supported by the comparison of the distribution of docking scores
25
26 of the 218 pool compounds (referenced as ChEMBL2051 and provided with IC50<1μM
27
towards NA) and targeted libraries NA.lib1 and NA.lib2 as well as of the 1800 pool
28
29 compounds (referenced as ChEMBL220 and provided with IC50<1μM towards AChE) and
30
31 targeted libraries AChE.lib1 and AChE.lib2 (see Figure S4 of Supporting Information). In a
32
33 continuing analysis aimed at providing users with a prioritization list of best compounds for
34 synthesis and testing, we sorted the de novo designed libraries according to the calculated
35
36 values of Ligand Efficiency (LE)69 which is a universally accepted indicator of compound
37
38 quality for prospective drug design. For the sake of comparison, the values of LE of the
39
40
cognate ligands were used as a reference.
41
42 As shown in Figure 9, best LE values accumulated in the early 0.12% and 10.38% for
43
44 NA.lib1 and NA.lib2, in the early 0.02% and 0.03% for AChE.lib1 and AChE.lib2, in the
45
46 early 1.29% for SARS-CoV-2.lib. While it is not wise to make any beforehand speculation
47 about these observed trends whose rationale may be case-by-case dependent, we can
48
49 conclude that these are the more meaningful fractions of the de novo generated targeted
50
51 chemical libraries. Importantly, these fractions contained those de novo designed
52
53
compounds better awarding molecular interactions at the active sites of their biological
54 counterparts while keeping as low as possible the structural complexity, which was
55
56 comparable or even lower than that of X-ray solved cognate ligands. What descend from
57
58 those observations could be of utmost importance for addressing well-informed drug
59
design strategies.
60

1
2
3 Finally, this new proposed method makes an important step forward considering the
4
5 scenario of the recent literature which already includes a number of generative de novo
6
7 design algorithms.17–26,29,30,36 In this respect, we mostly focused on crafting a novel pair-
8
based multi-objective strategy that, in conjunction with a reinforcement learning
9
10 framework, allows to guide the creative generation of drugs owning multiple
11
12 simultaneously optimized features based on the Pareto optimality philosophy. Indeed, this
13
14 new approach allowed to enhance the quality content of the results compared to recently
15 published methods, which enable the automated generation of novel compounds but
16
17 generally through single objective or weighted sum methods, 23,26,29,36 which require
18
19 human intervention for coefficient calibration with steps that are annoying, frustrating and
20
21
time consuming. Satisfactorily, our approach can be used as it is for the fast generation of
22 de novo targeted chemical libraries whose features fall in a given desired ranges giving the
23
24 users a pool of equivalent non-dominated solutions irrespective of calibration. Finally, note
25
26 that the herein proposed fitness function can be easily implemented as a reward option in
27
any already published reinforcement learning based molecular de novo design algorithm.
28
29 We chose to adapt the REINVENT algorithm23 because it is relatively easy to modify
30
31 compared to others already published; it is provided with a simple and well annotated
32
33 source code; it requires limited computational resources and shows still promising
34 performances.
35
36
37 Conclusions
38
39
40 The application of artificial intelligence and multi-objective optimization in computer
41
42
assisted drug discovery have paved the way to unprecedented chances for highly relevant
43 tasks such as structure activity relationships, target prediction, lead generation and
44
45 optimization, experimental design. All these activities are expected to shorten cycle-times
46
47 for the identification of new bioactive compounds for both industry and academia. In this
48 study, several aspects of artificial intelligence and multi-objective optimization for the de
49
50 novo drug design of targeted combinatorial libraries were investigated with the intention to
51
52 support real-life project workflows. As a matter of fact, three practical case studies of
53
54
therapeutic relevance have been widely discussed. In particular, this study showed as
55 complementing artificial intelligence with pair based multi-objective optimization is effective
56
57 in driving the de novo design of target chemical libraries whose compounds are the result
58
59 of a creative process based on molecular features taken by specific portions of the
60
chemical space, which were interfaced with particular biological targets. Last but not least,

1
2
3 our ultimate goal is not to replace bench chemistry activities but rather to inspire the
4
5 experimental work with low cost ideas.
6
7
8 Supporting Information
9
10
Features selected for pair based multi-objective optimization generation of the targeted
11
12 chemical library NA.lib1, NA.lib2, AChE.lib1, AChE.lib2 and SARS-CoV-2.lib (Table S1-
13
14 S5). Drug target predictions carried out by using the multi-fingerprint similarity search
15
16 MuSSel platform (Table S6). Overlap of X-ray solved and top-scored docking poses for
17 zanamivir, donepezil and N3 (Figure S1). Distribution of the optimized features of the
18
19 10000 compounds of the targeted chemical libraries NA.lib1 and NA.lib2 with respect to
20
21 the 218 pool compounds provided with IC50<1μM towards NA (referenced as
22
23
ChEMBL2051) as well as of the targeted libraries AChE.lib1 and AChE.lib2 with respect to
24 the 1800 pool compounds provided with IC50<1μM towards AChE (referenced as
25
26 ChEMBL220) (Figure S2). Distribution of MW and logP for NA.lib1 and NA.lib2 with
27
28 respect to the targeted antiviral library available from the catalog of Enamine (Figure S3).
29
List of all the 203 molecular features that can be optimized via the algorithm (features.txt).
30
31 Targeted chemical library NA.lib1 (NA.lib1.csv). Targeted chemical library NA.lib2
32
33 (NA.lib2.csv). Targeted chemical library AChE.lib1 (AChE.lib1.csv). Targeted chemical
34
35 library AChE.lib2 (AChE.lib2.csv). Targeted chemical library SARS-CoV-2.lib (SARS-CoV-
36 2.lib.csv). The algorithm described in this paper is freely downloadable at
37
38 https://github.com/alberdom88/moo-denovo.
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34 Figure 1. The solid red circles are non-dominated solutions and fall on the Pareto frontier,
35 colored in red. Dominated solutions are shown as unfilled circles and are ranked according
36 to the number of times they are dominated. Non-dominated solutions are given rank zero
37
38
and the dominated solutions are given ranks as shown.
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34 Figure 2. Average value of the S(x) fitness function computed at each iteration of the pair
35 based multi-objective optimization algorithm for each case study.
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44 Figure 3. Distribution of the optimized features and docking scores of the 10000
45 compounds and heat map showing the average pair based Pareto dominance concerned
46 with the targeted chemical libraries NA.lib1 and NA.lib2. On the bottom right corner, the
47 structure of zanamivir is reported.
48
49
50
51
52
53
54
55
56
57
58
59
60

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
Figure 4. Panels (a) and (b) report molecular interactions between NA (PDB entry 3B7E)
23 and de novo generated compounds taken from the first (NA.lib1_02) and second
24 (NA.lib2_01) targeted chemical libraries, respectively. Ligands and the target are rendered
25
in green sticks and gray cartoon, respectively. Zanamivir is also depicted in black
26
27 wireframe. Red arrows indicate hydrogen bonds. For the sake of clarity, only polar
28 hydrogen atoms are shown.
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
45 with the targeted chemical libraries AChE.lib1 and AChE.lib2. On the bottom right corner,
46 the structure of donepezil is reported.
47
48
49
50
51
52
53
54
55
56
57
58
59
60

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23 Figure 6. Panels (a) and (b) report molecular interactions between AChE (PDB entry
24
25 4EY7) and de novo generated compounds taken from the first (AChE.lib1_02) and second
26 (AChE.lib2_04) targeted chemical libraries, respectively. Ligands and the target are
27 rendered in green sticks and gray cartoon, respectively. Donepezil is also depicted in black
28
29 wireframe. Red, green and blue arrows indicate hydrogen bonds,  and cation-
30  interactions, respectively. For the sake of clarity, only polar hydrogen atoms are shown.
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
27 with the targeted chemical library SARS-CoV-2.lib. On the bottom right corner, the
28 structure of N3 inhibitor is reported.
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21 Figure 8. Molecular interactions between SARS-CoV-2 main protease (PDB entry 6LU7)
22 and de novo generated compound SARS-CoV-2.lib_03. Ligand and target are rendered in
23
24
green sticks and gray cartoon, respectively. N3 is also depicted in black wireframe. Red
25 arrows indicate hydrogen bonds. For the sake of clarity, only polar hydrogen atoms are
26 shown.
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16 Figure 9. Red and blue solid lines as well as shaded areas indicate the average and the
17 standard deviation of LE values, respectively, computed varying the considered
18 percentage of each targeted library sorted at the increase of LE. The dark solid line
19
20 represents the LE value for the X-ray solved cognate ligands used as reference for each
21 case study.
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60

1
2
3 Table 1. Quality metrics of the targeted chemical libraries generated by de novo drug
4
design. Note that SA values are reported as mean along with standard deviation.
5
6
7 Targeted
Validity Unicity IntDiv Filters SA
8 chemical library
9 NA.lib1 0.995 0.986 0.733 0.620 2.5220.361
10 NA.lib2 0.998 0.957 0.669 0.852 4.7000.233
11 AChE.lib1 0.993 0.945 0.715 0.920 1.9420.192
12 AChE.lib2 0.998 0.959 0.625 0.841 2.1650.256
13 SARS-CoV-2.lib 0.999 0.910 0.539 0.773 3.6450.281
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60

1
2 Table 2. Representative examples of potential inhibitors generated through automated de novo drug design. The superscript letters a
3 and b indicate docking scores (kcal/mol) and S(x) fitness values.
4
5
6
7
8
9
10
11
NA.lib1_01 -7.653a 5.833b NA.lib1_02 -6.920a 5.784b NA.lib1_03 -6.841a 5.603b NA.lib1_04 -6.820a 5.846b
12
13
14
15
16
17
18 NA.lib2_01 -7.892a 3.946b NA.lib2_02 -7.750a 3.915b NA.lib2_03 -7.687a 3.825b NA.lib2_04 -7.504a 3.954b
19
20
21
22
23
24
25 AChE.lib1_01 -11.152a 5.846b AChE.lib1_02 -11.045a 5.998b AChE.lib1_03 -10.044a 5.914b AChE.lib1_04 -9.422a 5.741b
26
27
28
29
30
31 AChE.lib2_01 -13.466a 3.894b AChE.lib2_02 -13.118a 3.898b AChE.lib2_03 -12.897a 3.851b AChE.lib2_04 -12.703a 3.967b
32
33
34
35
36
37
38 SAR-COV-2.lib_01 -10.409a 3.952b SAR-COV-2.lib_02 -9.697a 3.805b SAR-COV-2.lib_03 -9.471a 3.92b SAR-COV-2.lib_04 -9.046a 3.928b
39
40
41
42
43 ACS Paragon Plus Environment
44
45
46
1
2
3 REFERENCES
4
5
(1) Macarron, R.; Banks, M. N.; Bojanic, D.; Burns, D. J.; Cirovic, D. A.; Garyantes, T.; Green, D.
6 V. S.; Hertzberg, R. P.; Janzen, W. P.; Paslay, J. W.; Schopfer, U.; Sittampalam, G. S. Impact of High-
7 Throughput Screening in Biomedical Research. Nat Rev Drug Discov 2011, 10, 188–195.
8 (2) Reymond, J.-L.; Deursen, R. van; Blum, L. C.; Ruddigkeit, L. Chemical Space as a Source for
9 New Drugs. Med. Chem. Commun. 2010, 1, 30–38.
10
11 (3) Hann, M. M.; Oprea, T. I. Pursuing the Leadlikeness Concept in Pharmaceutical Research.
12 Current Opinion in Chemical Biology 2004, 8, 255–263.
13 (4) Waring, M. J.; Arrowsmith, J.; Leach, A. R.; Leeson, P. D.; Mandrell, S.; Owen, R. M.;
14 Pairaudeau, G.; Pennie, W. D.; Pickett, S. D.; Wang, J.; Wallace, O.; Weir, A. An Analysis of the
15
16
Attrition of Drug Candidates from Four Major Pharmaceutical Companies. Nat. Rev. Drug Discovery
17 2015, 14, 475–486.
18 (5) Medina-Franco, J. L.; Martinez-Mayorga, K.; Meurice, N. Balancing Novelty with Confined
19 Chemical Space in Modern Drug Discovery. Expert Opin. Drug Discovery 2014, 9, 151–165.
20 (6) Mangiatordi, G. F.; Trisciuzzi, D.; Alberga, D.; Denora, N.; Iacobazzi, R. M.; Gadaleta, D.;
21
22 Catto, M.; Nicolotti, O. Novel Chemotypes Targeting Tubulin at the Colchicine Binding Site and
23 Unbiasing P-Glycoprotein. Eur. J. Med. Chem. 2017, 139, 792–803.
24 (7) Singh, N.; Chaput, L.; Villoutreix, B. O. Virtual Screening Web Servers: Designing Chemical
25 Probes and Drug Candidates in the Cyberspace. Brief Bioinform. 2020, 1-29.
26
(8) Nicolotti, O.; Gillet, V. J.; Fleming, P. J.; Green, D. V. S. Multiobjective Optimization in
27
28 Quantitative Structure−Activity Relationships: Deriving Accurate and Interpretable QSARs. J. Med.
29 Chem. 2002, 45, 5069–5080.
30 (9) Nicolotti, O.; Giangreco, I.; Miscioscia, T. F.; Carotti, A. Improving Quantitative
31 Structure−Activity Relationships through Multiobjective Optimization. J. Chem. Inf. Model. 2009,
32
33 49 , 2290–2302.
34 (10) Nicolotti, O.; Giangreco, I.; Introcaso, A.; Leonetti, F.; Stefanachi, A.; Carotti, A. Strategies of
35 Multi-Objective Optimization in Drug Discovery and Development. Expert Opin. Drug Discovery
36 2011, 6, 871–884.
37
(11) Schneider, G.; Fechner, U. Computer-Based de Novo Design of Drug-like Molecules. Nat Rev
38
39 Drug Discov 2005, 4, 649–663.
40 (12) Kuntz, I. D. Structure-Based Strategies for Drug Design and Discovery. Science 1992, 257,
41 1078–1082.
42 (13) Chen, H.; Engkvist, O.; Wang, Y.; Olivecrona, M.; Blaschke, T. The Rise of Deep Learning in
43
44
Drug Discovery. Drug Discovery Today 2018, 23, 1241–1250.
45 (14) Lo, Y.-C.; Rensi, S. E.; Torng, W.; Altman, R. B. Machine Learning in Chemoinformatics and
46 Drug Discovery. Drug Discovery Today 2018, 23, 1538–1546.
47 (15) Schneider, P.; Walters, W. P.; Plowright, A. T.; Sieroka, N.; Listgarten, J.; Goodnow, R. A.;
48
Fisher, J.; Jansen, J. M.; Duca, J. S.; Rush, T. S.; Zentgraf, M.; Hill, J. E.; Krutoholow, E.; Kohler, M.;
49
50 Blaney, J.; Funatsu, K.; Luebkemann, C.; Schneider, G. Rethinking Drug Design in the Artificial
51 Intelligence Era. Nat Rev Drug Discov 2019, 1–12.
52 (16) Schneider, G.; Clark, D. E. Automated De Novo Drug Design: Are We Nearly There Yet?
53 Angew. Chem., Int. Ed. 2019, 58, 10792–10803.
54
55
(17) Lim, J.; Ryu, S.; Kim, J. W.; Kim, W. Y. Molecular Generative Model Based on Conditional
56 Variational Autoencoder for de Novo Molecular Design. J. Cheminf. 2018, 10, 31.
57 (18) Gómez-Bombarelli, R.; Wei, J. N.; Duvenaud, D.; Hernández-Lobato, J. M.; Sánchez-
58 Lengeling, B.; Sheberla, D.; Aguilera-Iparraguirre, J.; Hirzel, T. D.; Adams, R. P.; Aspuru-Guzik, A.
59
Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules. ACS
60
Cent. Sci. 2018, 4, 268–276.

1
2
3 (19) Sattarov, B.; Baskin, I. I.; Horvath, D.; Marcou, G.; Bjerrum, E. J.; Varnek, A. De Novo
4
5
Molecular Design by Combining Deep Autoencoder Recurrent Neural Networks with Generative
6 Topographic Mapping. J. Chem. Inf. Model. 2019, 59, 1182–1196.
7 (20) Kadurin, A.; Nikolenko, S.; Khrabrov, K.; Aliper, A.; Zhavoronkov, A. DruGAN: An Advanced
8 Generative Adversarial Autoencoder Model for de Novo Generation of New Molecules with
9 Desired Molecular Properties in Silico. Mol. Pharmaceutics 2017, 14, 3098–3104.
10
11 (21) Putin, E.; Asadulaev, A.; Ivanenkov, Y.; Aladinskiy, V.; Sanchez-Lengeling, B.; Aspuru-Guzik,
12 A.; Zhavoronkov, A. Reinforced Adversarial Neural Computer for de Novo Molecular Design. J.
13 Chem. Inf. Model. 2018, 58, 1194–1204.
14 (22) Prykhodko, O.; Johansson, S. V.; Kotsias, P.-C.; Arús-Pous, J.; Bjerrum, E. J.; Engkvist, O.;
15
16
Chen, H. A de Novo Molecular Generation Method Using Latent Vector Based Generative
17 Adversarial Network. J. Cheminf. 2019, 11, 74.
18 (23) Olivecrona, M.; Blaschke, T.; Engkvist, O.; Chen, H. Molecular De-Novo Design through
19 Deep Reinforcement Learning. J. Cheminf. 2017, 9, 48.
20 (24) Gupta, A.; Müller, A. T.; Huisman, B. J. H.; Fuchs, J. A.; Schneider, P.; Schneider, G.
21
22 Generative Recurrent Networks for De Novo Drug Design. Mol. Inf. 2018, 37, 1700111.
23 (25) Merk, D.; Friedrich, L.; Grisoni, F.; Schneider, G. De Novo Design of Bioactive Small
24 Molecules by Artificial Intelligence. Mol Inform 2018, 37.
25 (26) Popova, M.; Isayev, O.; Tropsha, A. Deep Reinforcement Learning for de Novo Drug Design.
26
Sci. Adv. 2018, 4, eaap7885.
27
28 (27) Mendez, D.; Gaulton, A.; Bento, A. P.; Chambers, J.; De Veij, M.; Félix, E.; Magariños, M. P.;
29 Mosquera, J. F.; Mutowo, P.; Nowotka, M.; Gordillo-Marañón, M.; Hunter, F.; Junco, L.;
30 Mugumbate, G.; Rodriguez-Lopez, M.; Atkinson, F.; Bosc, N.; Radoux, C. J.; Segura-Cabrera, A.;
31 Hersey, A.; Leach, A. R. ChEMBL: Towards Direct Deposition of Bioassay Data. Nucleic Acids Res
32
33 2019, 47, D930–D940.
34 (28) Irwin, J. J.; Shoichet, B. K. ZINC − A Free Database of Commercially Available Compounds for
35 Virtual Screening. J. Chem. Inf. Model. 2005, 45, 177–182.
36 (29) Li, Y.; Zhang, L.; Liu, Z. Multi-Objective de Novo Drug Design with Conditional Graph
37
Generative Model. J Cheminform 2018, 10, 33.
38
39 (30) Pogány, P.; Arad, N.; Genway, S.; Pickett, S. D. De Novo Molecule Design by Translating
40 from Reduced Graphs to SMILES. J. Chem. Inf. Model. 2019, 59, 1136–1146.
41 (31) Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A. A.; Veness, J.; Bellemare, M. G.; Graves, A.;
42 Riedmiller, M.; Fidjeland, A. K.; Ostrovski, G.; Petersen, S.; Beattie, C.; Sadik, A.; Antonoglou, I.;
43
44
King, H.; Kumaran, D.; Wierstra, D.; Legg, S.; Hassabis, D. Human-Level Control through Deep
45 Reinforcement Learning. Nature 2015, 518, 529–533.
46 (32) Merk, D.; Grisoni, F.; Friedrich, L.; Gelzinyte, E.; Schneider, G. Computer-Assisted Discovery
47 of Retinoid X Receptor Modulating Natural Products and Isofunctional Mimetics. J. Med. Chem.
48
2018, 61, 5442–5447.
49
50 (33) Polykovskiy, D.; Zhebrak, A.; Vetrov, D.; Ivanenkov, Y.; Aladinskiy, V.; Mamoshina, P.;
51 Bozdaganyan, M.; Aliper, A.; Zhavoronkov, A.; Kadurin, A. Entangled Conditional Adversarial
52 Autoencoder for de Novo Drug Discovery. Mol. Pharmaceutics 2018, 15, 4398–4405.
53 (34) Zhavoronkov, A.; Ivanenkov, Y. A.; Aliper, A.; Veselov, M. S.; Aladinskiy, V. A.; Aladinskaya,
54
55
A. V.; Terentiev, V. A.; Polykovskiy, D. A.; Kuznetsov, M. D.; Asadulaev, A.; Volkov, Y.; Zholus, A.;
56 Shayakhmetov, R. R.; Zhebrak, A.; Minaeva, L. I.; Zagribelnyy, B. A.; Lee, L. H.; Soll, R.; Madge, D.;
57 Xing, L.; Guo, T.; Aspuru-Guzik, A. Deep Learning Enables Rapid Identification of Potent DDR1
58 Kinase Inhibitors. Nat Biotechnol 2019, 37, 1038–1040.
59
(35) Zhou, Z.; Kearnes, S.; Li, L.; Zare, R. N.; Riley, P. Optimization of Molecules via Deep
60
Reinforcement Learning. Sci. Rep. 2019, 9, 10752.

1
2
3 (36) Ståhl, N.; Falkman, G.; Karlsson, A.; Mathiason, G.; Boström, J. Deep Reinforcement
4
5
Learning for Multiparameter Optimization in de Novo Drug Design. J. Chem. Inf. Model. 2019, 59,
6 3166–3176.
7 (37) Deng, J.; Yang, Z.; Li, Y.; Samaras, D.; Wang, F. Towards Better Opioid Antagonists Using
8 Deep Reinforcement Learning. arXiv:2004.04768 [cs, q-bio] 2020.
9 (38) Nicolaou, C. A.; Brown, N. Multi-Objective Optimization Methods in Drug Design. Drug
10
11 Discovery Today: Technol. 2013, 10, e427–e435.
12 (39) John Harris, C.; D. Hill, R.; W. Sheppard, D.; J. Slater, M.; F. W. Stouten, P. The Design and
13 Application of Target-Focused Compound Libraries
14 https://www.ingentaconnect.com/content/ben/cchts/2011/00000014/00000006/art00007
15
16
(accessed Apr 8, 2020).
17 (40) Https://Github.Com/Alberdom88/Moo-Denovo.
18 (41) Polykovskiy, D.; Zhebrak, A.; Sanchez-Lengeling, B.; Golovanov, S.; Tatanov, O.; Belyaev, S.;
19 Kurbanov, R.; Artamonov, A.; Aladinskiy, V.; Veselov, M.; Kadurin, A.; Johansson, S.; Chen, H.;
20 Nikolenko, S.; Aspuru-Guzik, A.; Zhavoronkov, A. Molecular Sets (MOSES): A Benchmarking
21
22 Platform for Molecular Generation Models. arXiv:1811.12823 [cs, stat] 2019.
23 (42) Brown, N.; Fiscato, M.; Segler, M. H. S.; Vaucher, A. C. GuacaMol: Benchmarking Models for
24 de Novo Molecular Design. J. Chem. Inf. Model. 2019, 59, 1096–1108.
25 (43) Sohrabi, C.; Alsafi, Z.; O’Neill, N.; Khan, M.; Kerwan, A.; Al-Jabir, A.; Iosifidis, C.; Agha, R.
26
World Health Organization Declares Global Emergency: A Review of the 2019 Novel Coronavirus
27
28 (COVID-19). Int. J. Surg. 2020, 76, 71–76.
29 (44) Lai, C.-C.; Shih, T.-P.; Ko, W.-C.; Tang, H.-J.; Hsueh, P.-R. Severe Acute Respiratory Syndrome
30 Coronavirus 2 (SARS-CoV-2) and Coronavirus Disease-2019 (COVID-19): The Epidemic and the
31 Challenges. Int. J. Antimicrob. Agents 2020, 55, 105924.
32
33 (45) Chung, J.; Gulcehre, C.; Cho, K.; Bengio, Y. Empirical Evaluation of Gated Recurrent Neural
34 Networks on Sequence Modeling. arXiv:1412.3555 [cs] 2014.
35 (46) Landrum, G. (2006). RDKit: Open-Source Cheminformatics.
36 (47) Sutton R, Barton A (1998) Reinforcement Learning: An Introduction, 1st Edn. MIT Press,
37
Cambridge.
38
39 (48) Willett, P.; Barnard, J. M.; Downs, G. M. Chemical Similarity Searching. J. Chem. Inf.
40 Comput. Sci. 1998, 38, 983–996.
41 (49) Rogers, D.; Hahn, M. Extended-Connectivity Fingerprints. J. Chem. Inf. Model. 2010, 50,
42 742–754.
43
44
(50) Benhenda, M. ChemGAN Challenge for Drug Discovery: Can AI Reproduce Natural Chemical
45 Diversity? arXiv:1708.08227 [cs, stat] 2017.
46 (51) Kalgutkar, A. S.; Gardner, I.; Obach, R. S.; Shaffer, C. L.; Callegari, E.; Henne, K. R.; Mutlib, A.
47 E.; Dalvie, D. K.; Lee, J. S.; Nakai, Y.; O’Donnell, J. P.; Boer, J.; Harriman, S. P. A Comprehensive
48
Listing of Bioactivation Pathways of Organic Functional Groups
49
50 https://www.ingentaconnect.com/content/ben/cdm/2005/00000006/00000003/art00001
51 (accessed Apr 8, 2020).
52 (52) Kalgutkar, A. S.; Soglia, J. R. Minimising the Potential for Metabolic Activation in Drug
53 Discovery. Expert Opin. Drug Metab. & Toxicol. 2005, 1, 91–142.
54
55
(53) Baell, J. B.; Holloway, G. A. New Substructure Filters for Removal of Pan Assay Interference
56 Compounds (PAINS) from Screening Libraries and for Their Exclusion in Bioassays. J. Med. Chem.
57 2010, 53, 2719–2740.
58 (54) Schrödinger Release 2019-4: BioLuminate, Schrödinger, LLC, New York, NY, 2019.
59
(55) Xu, X.; Zhu, X.; Dwek, R. A.; Stevens, J.; Wilson, I. A. Structural Characterization of the 1918
60
Influenza Virus H1N1 Neuraminidase. J. Virol. 2008, 82, 10493–10501.

1
2
3 (56) Cheung, J.; Rudolph, M. J.; Burshteyn, F.; Cassidy, M. S.; Gary, E. N.; Love, J.; Franklin, M. C.;
4
5
Height, J. J. Structures of Human Acetylcholinesterase in Complex with Pharmacologically
6 Important Ligands. J. Med. Chem. 2012, 55, 10282–10286.
7 (57) Jin, Z.; Du, X.; Xu, Y.; Deng, Y.; Liu, M.; Zhao, Y.; Zhang, B.; Li, X.; Zhang, L.; Peng, C.; Duan,
8 Y.; Yu, J.; Wang, L.; Yang, K.; Liu, F.; Jiang, R.; Yang, X.; You, T.; Liu, X.; Yang, X.; Bai, F.; Liu, H.; Liu,
9 X.; Guddat, L. W.; Xu, W.; Xiao, G.; Qin, C.; Shi, Z.; Jiang, H.; Rao, Z.; Yang, H. Structure of Mpro
10
11 from COVID-19 Virus and Discovery of Its Inhibitors. bioRxiv 2020, 2020.02.26.964882.
12 (58) Moscona, A. Neuraminidase Inhibitors for Influenza. N. Eng. J. Med. 2005, 353, 1363–1373.
13 (59) Stanciu, G. D.; Luca, A.; Rusu, R. N.; Bild, V.; Beschea Chiriac, S. I.; Solcan, C.; Bild, W.;
14 Ababei, D. C. Alzheimer’s Disease Pharmacotherapy in Relation to Cholinergic System Involvement.
15
16
Biomolecules 2020, 10, 40.
17 (60) Pisani, L.; Farina, R.; Soto-Otero, R.; Denora, N.; Mangiatordi, G. F.; Nicolotti, O.; Mendez-
18 Alvarez, E.; Altomare, C. D.; Catto, M.; Carotti, A. Searching for Multi-Targeting Neurotherapeutics
19 against Alzheimer’s: Discovery of Potent AChE-MAO B Inhibitors through the Decoration of the 2H-
20 Chromen-2-One Structural Motif. Molecules 2016, 21, 362.
21
22 (61) Birks, J.; Harvey, R. J. Donepezil for Dementia Due to Alzheimer’s Disease. Cochrane
23 Database Syst. Rev. 2006, No. 1.
24 (62) Conejo-García, A.; Pisani, L.; del Carmen Núñez, M.; Catto, M.; Nicolotti, O.; Leonetti, F.;
25 Campos, J. M.; Gallo, M. A.; Espinosa, A.; Carotti, A. Homodimeric Bis-Quaternary Heterocyclic
26
Ammonium Salts as Potent Acetyl- and Butyrylcholinesterase Inhibitors: A Systematic Investigation
27
28 of the Influence of Linker and Cationic Heads over Affinity and Selectivity. J. Med. Chem. 2011, 54,
29 2627–2645.
30 (63) Patick, A. K.; Potts, K. E. Protease Inhibitors as Antiviral Agents. Clin Microbiol Rev 1998, 11,
31 614–627.
32
33 (64) Chang, Y.-C.; Tung, Y.-A.; Lee, K.-H.; Chen, T.-F.; Hsiao, Y.-C.; Chang, H.-C.; Hsieh, T.-T.; Su,
34 C.-H.; Wang, S.-S.; Yu, J.-Y.; Shih, S.; Lin, Y.-H.; Lin, Y.-H.; Tu, Y.-C. E.; Hsu, C.-H.; Juan, H.-F.; Tung,
35 C.-W.; Chen, C.-Y. Potential Therapeutic Agents for COVID-19 Based on the Analysis of Protease
36 and RNA Polymerase Docking. 2020.
37
(65) Grebner, C.; Matter, H.; Plowright, A. T.; Hessler, G. Automated De Novo Design in
38
39 Medicinal Chemistry: Which Types of Chemistry Does a Generative Neural Network Learn? J. Med.
40 Chem. 2020.
41 (66) Struble, T. J.; Alvarez, J. C.; Brown, S. P.; Chytil, M.; Cisar, J.; DesJarlais, R. L.; Engkvist, O.;
42 Frank, S. A.; Greve, D. R.; Griffin, D. J.; Hou, X.; Johannes, J. W.; Kreatsoulas, C.; Lahue, B.; Mathea,
43
44
M.; Mogk, G.; Nicolaou, C. A.; Palmer, A. D.; Price, D. J.; Robinson, R. I.; Salentin, S.; Xing, L.;
45 Jaakkola, T.; Green, William. H.; Barzilay, R.; Coley, C. W.; Jensen, K. F. Current and Future Roles of
46 Artificial Intelligence in Medicinal Chemistry Synthesis. J. Med. Chem. 2020.
47 (67) Alberga, D.; Trisciuzzi, D.; Montaruli, M.; Leonetti, F.; Mangiatordi, G. F.; Nicolotti, O. A New
48
Approach for Drug Target and Bioactivity Prediction: The Multifingerprint Similarity Search
49
50 Algorithm (MuSSeL). J. Chem. Inf. Model. 2019, 59, 586–596.
51 (68) Montaruli, M.; Alberga, D.; Ciriaco, F.; Trisciuzzi, D.; Tondo, A. R.; Mangiatordi, G. F.;
52 Nicolotti, O. Accelerating Drug Discovery by Early Protein Drug Target Prediction Based on a Multi-
53 Fingerprint Similarity Search †. Molecules 2019, 24, 2233.
54
55
(69) Cavalluzzi, M. M.; Mangiatordi, G. F.; Nicolotti, O.; Lentini, G. Ligand Efficiency Metrics in
56 Drug Discovery: The Pros and Cons from a Practical Perspective. Expert Opin. Drug Discovery 2017,
57 12, 1087–1104.
58
59
60

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60 ACS Paragon Plus Environment

ứng dụng de novo trong điều trị SAR-CoV-2

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ứng dụng de novo trong điều trị SAR-CoV-2

Uploaded by

Copyright:

Available Formats

Subscriber access provided by Bibliothèque de l'Université Paris-Sud

Machine Learning and Deep Learning

is published by the American Chemical Society. 1155 Sixteenth Street N.W.,

ACS Paragon Plus Environment

ACS Paragon Plus Environment

ACS Paragon Plus Environment

ACS Paragon Plus Environment

ACS Paragon Plus Environment

ACS Paragon Plus Environment

ACS Paragon Plus Environment

ACS Paragon Plus Environment

ACS Paragon Plus Environment

ACS Paragon Plus Environment

ACS Paragon Plus Environment

ACS Paragon Plus Environment

ACS Paragon Plus Environment

ACS Paragon Plus Environment

ACS Paragon Plus Environment

ACS Paragon Plus Environment

ACS Paragon Plus Environment

ACS Paragon Plus Environment

ACS Paragon Plus Environment

ACS Paragon Plus Environment

ACS Paragon Plus Environment

ACS Paragon Plus Environment

ACS Paragon Plus Environment

ACS Paragon Plus Environment

ACS Paragon Plus Environment

ACS Paragon Plus Environment

ACS Paragon Plus Environment

ACS Paragon Plus Environment

ACS Paragon Plus Environment

ACS Paragon Plus Environment

ACS Paragon Plus Environment

ACS Paragon Plus Environment

You might also like