Green Chemistry

Green Chemistry
View Article Online

PAPER View Journal
Published on 06 December 2023. Downloaded by University of California - Los Angeles on 12/9/2023 4:01:17 PM.
Multi-objective optimization strategy for green

Cite this: DOI: 10.1039/d3gc04354a
solvent design via a deep generative model
learned from pre-set molecule pairs†
Jun Zhang,a Qin Wang,*b Huaqiang Wen,a Vincent Gerbaud, c
Saimeng Jin a
and Weifeng Shen *a
Green solvent design is usually a multi-objective optimization problem that requires identification of a set
of solvent molecules to balance multiple, often trade-off, properties. At the same time, process con-
straints need to be addressed since solvent properties impact the process feasibility like in the extractive
distillation separation process. Hence, a green solvent multi-objective optimization framework is pro-
posed with EH&S properties, process constraints, and energy consumption analysis, where the molecular
design optimization model relies upon the ability of the proposed infinite dilution activity coefficient
(IDAC) direct prediction model to accurately predict process properties in addition to molecular pro-
perties. The process properties are short-cut properties of the extractive distillation process, namely
selectivity and solution capacity. To this end, the proposed IDAC direct prediction model is employed to
prepare molecule pairs with selectivity and solution capacity improvement constraints to train the mole-
cular multi-objective optimization model, which can learn the optimization path from the pre-set mole-
cule pairs and then optimize a given solvent via the prediction of a disconnection site and molecular frag-
ment addition or removal at that site. An extractive distillation process to separate a cyclohexane/benzene
mixture is taken as an example to demonstrate the proposed framework. As a result, three candidate
Received 10th November 2023, green solvents are optimized and designed to recover benzene from mixtures of benzene and cyclo-
Accepted 22nd November 2023
hexane. The proposed green solvent multi-objective optimization framework is flexible enough to be
DOI: 10.1039/d3gc04354a employed in other chemical separation processes, where solvent property assessment is needed to evalu-
rsc.li/greenchem ate the feasibility and performance of the processes.
1 Introduction based property predictions are numerous but are confronted with
various challenges, such as coping with stereoisomers for group-
In many separation processes, such as azeotropic or extractive dis- contribution methods, or correctly sampling the vast solvent
tillation or liquid–liquid extraction, a solvent is needed to search space spanning the chemistry field for computer-costly
perform the desired separation. Solvent design is inherently a quantum mechanical methods.
constrained multi-objective optimization problem.1,2 The first set Besides, in separation processes, the process feasibility sets
of constraints concerns matching the desired solvent property additional constraints on the solvent. Hence, a search simul-
values. These properties are multiple and usually cover not only taneously combining molecular and process constraints is a
molecular properties related to the primary function of the challenge, which is the purpose of our study, and which would
solvent, such as solubilizing an active principle or having a pre- be facilitated by using model-based approaches to optimize
ferred affinity with one of the molecules in a mixture, but also the structure of solvent molecules. But successive optimal
other properties that may ease the process operation. Model- solvent design first followed by an optimal process design
bears a risk of error propagation that could rule out the whole
procedure.
a
School of Chemistry and Chemical Engineering, Chongqing University, Chongqing In this case, we proposed a molecular multi-objective
400044, China. E-mail: shenweifeng@cqu.edu.cn optimization model to purposefully modify the structure of
b
School of Chemistry and Chemical Engineering, Chongqing University of Science solvent molecules with some drawbacks (such as EH&S nega-
and Technology, Chongqing 401331, China. E-mail: wangq356@mail2.sysu.edu.cn tive impact) to obtain the green solvent with the desired separ-
c
Laboratoire de Génie Chimique, Université de Toulouse, CNRS, INP, UPS, Toulouse,
ation performance rather than simply utilizing a molecular
France
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/ generative model to enlarge the chemical space for subsequent
10.1039/d3gc04354a solvent screenings with multi-index constraints. The multi-
This journal is © The Royal Society of Chemistry 2023 Green Chem.

View Article Online
Paper Green Chemistry
objective optimization model can learn the optimization path sidered, such as the boiling point for the distillation process
from the pre-set molecule pairs. Every pair of molecules (Mx and the molar volume for batch processes along with pro-
and My) in the pre-set molecule pairs had similar molecular perties related to transport phenomena like viscosity, surface
structures and only had a single different disconnection site, tension, heat capacity, etc.3 Nowadays, the sustainability of
but the scores of both selectivity and solution capacity of My new solvents (e.g., toxicity, safety, environmental impact,
were at least 20% larger than those of Mx. The prepared pre-set etc.)4–6 is also becoming a key design objective,7 which is
molecular pairs were used to train the proposed molecular especially important for the green solvent design task, and for
multi-objective optimization model, which can learn the differ- complying with regulations such as the US Toxicity
ence between the molecular pairs and can learn the optimiz- Characteristic Leaching Procedure (TCLP) or the EU
ation path from Mx to My. To prepare the molecule pairs, an Registration, Evaluation, Authorization, and Restriction of
improved deep learning-based IDAC direct prediction model Chemicals (REACH), which affect process authorization.
trained over a COSMO-SAC database was developed for predict- Considering the vast number of potential solvents, the trial-
ing the selectivity and solution capacity of the molecule pairs. and-error method for solvent identification may be highly
The proposed IDAC direct prediction method can provide time-consuming and even unrealistic when one considers only
superior predictive performance compared with the IDAC a single property. In addition, promising solvents could be
indirect prediction method, which first predicted the VCOSMO missed if the trial-and-error method relies on a fixed solvent
and 51 σ-profile and then calculated the IDAC using the database. Hence, model-based solvent selection or design
COSMO-SAC model. The indirect IDAC prediction method methods, like computer-aided molecular design (CAMD), are
resulted in more information lost during the prediction and extremely desirable to address the issue of exploring the vast
COSMO-SAC calculation processes. The improved deep learn- solvent search space.8
ing-based direct IDAC prediction model was integrated with In support of any computer-aided solvent design approach,
the molecular multi-objective optimization model to form the one needs to access solvent property values. Evaluation of the
proposed green solvent multi-objective and multi-scale optim- solvent thermodynamic properties requires measuring them
ization framework with EH&S properties and process con- or calculating them using property estimation models because
straints, and energy and economic analysis. The proposed they are more appropriate in a preliminary process design
green solvent multi-objective and multi-scale optimization phase. Hence, property calculation or estimation models play a
framework can: (1) simultaneously optimize multiple trade-off significantly important role in model-based solvent design
properties such as the selectivity and solution capacity of the methods since they can correlate molecular structures with
solvent; (2) learn from the pre-set molecule pairs that have solvent thermodynamic properties. For any property of interest
similar molecular structures but have differences in their pro- in a real process, there exist a variety of property models, and
perties of interest; (3) visualize the optimization path of the choosing the most suitable models is a key step.9 Each model
solvent’s molecular structure; and (4) accurately and directly bears different accuracies, predictive capabilities, and compu-
predict the IDAC of the molecules. tation costs.10 The property estimation methods mainly
The paper is organized as follows: the next section (Section include descriptor-based methods,11 group-contribution (GC)
2) gives a non-exhaustive overview of solvent design issues, methods,12 quantum mechanical (QM) methods,13 and deep
related computer-aided approaches, and connections to some learning (DL) methods.14,15 For example, for extractive distilla-
process design issues for extractive distillation processes. tion, the process we select for illustration, one real property of
Section 3 describes the integrated molecular multi-objective relevance might be stated as having a preferential affinity with
and multi-scale optimization framework. Section 4 describes one of the compounds in the mixture to be separated. It can
and evaluates the performance of the improved model for the be evaluated by various models, by comparing the similarity of
direct prediction of the infinite dilution activity coefficient the Hansen solubility parameter values between the solvent
using deep learning techniques. Section 5 introduces the and the molecule of interest, a simple correlative model with
molecular multi-objective optimization model. Section 6 is an no access to temperature dependency; by solving the thermo-
illustrative case study about solvent optimization and design dynamic phase equilibrium for computing solubility with
for an extractive distillation process. temperature dependency; or by comparing interaction surface
potentials, like the COSMO sigma potential curves, which
requires quantum mechanics calculations. The GC method is
2 Background one of the most widely utilized and efficient techniques to
evaluate macroscopic physicochemical properties. However,
When designing a solvent for separation process, one should the performance of first-order GC models, with contributions
match target values with properties that directly impact the regressed over experimental data directly related to the occur-
process separation feasibility such as the selectivity and solu- rence of simple chemical groups like –CH2, –OH, etc., is some-
tion capacity in liquid–liquid separation, melting point in times weakened because they cannot take account of the proxi-
solid separation processes, etc. At the same time, properties mity effects and distinguish between isomers.3,16 To address
that affect the process performance and operation, in terms of these issues, second- and third-order GC models have been
economics and energy requirements, should also be con- developed for discriminating the structural isomers.17 but they
Green Chem. This journal is © The Royal Society of Chemistry 2023

View Article Online
Green Chemistry Paper
are still deficient of many stereoisomers such as cis/trans Therefore, this kind of single objective optimization model is
isomers.18 These issues can be tackled with quantum mechan- very difficult to couple with the multi-dimensional and highly
ical-based (QM-based) solvation models, such as nonlinear chemical separation process. Although there are
COSMO-RS19,20 and COSMO-SAC.21,22 With only a few para- deep molecular optimization models labeled “multi-
meters such as the surface charge density profile (σ-profile) objective”,47,48 these models usually aggregate multiple objec-
and the cavity volumes (VCOSMO), the COSMO-based models tives into a single scalar objective.
can achieve a decent accuracy for the calculation of thermo- However, solvent property knowledge is only a first step in
dynamic properties. However, the initial QM calculations bear the design of a performing separation process, for which the
a heavy computational cost and are highly time-consuming, process model can be highly nonlinear because the process
and even unrealistic when exploring the vast search space of feasibility is often directly related to the characteristics of the
solvent molecules.23 To this end, the GC-COSMO techniques solvent. For example, there are some trade-off properties such
have been proposed as a shortcut to more efficiently access the as the selectivity and solution capacity that are not perfectly
VCOSMO and σ-profile.24,25 However, due to the inherent GC correlated and, therefore, molecular multi-objective optimiz-
limitations, these GC-COSMO techniques not only have ation cannot be addressed by these models. Hence, some
difficulties in appropriately handling isomers and proximity authors have explored the simultaneous design of the solvent
effects but are also limited in the variety of functional groups and the process attributes in a so-called reverse engineering
available in open-source databases. With the availability of the computer aided molecule and process design (CAMPD)
COSMO-type databases (e.g., the VT-200526), as alternatives, approach. For example, some authors have proposed a frame-
deep learning-based (DL-based) techniques27–30 can be applied work for the integrated design of a solvent and extractive distil-
as another shortcut to obtain the σ-profile and VCOSMO.31,32 lation process by solving a multi-objective optimization
However, the VT-2005 database only contains 1431 com- problem addressing constraints related to thermodynamic
pounds, which may not be enough to train a DL-based predic- process feasibility, along with process operation, a process
tion model with satisfying generalization ability. Additionally, model, and molecular constraints,49 or a more rigorous rate-
such DL-based prediction models are developed to predict the based model.50 In these studies, the property prediction in a
VCOSMO and the σ-profile, and then the predicted parameters molecular scale is addressed using COSMO approaches while
are used to calculate the IDAC. This indirect IDAC calculation the process model can be a pinch-based model based on a
process could lead to a decline in accuracy. minimum solvent flow rate and minimum energy demand49 or
Once property estimation models are available, computer- a more rigorous rate-based model.50 The use of such process
aided molecular design33 (CAMD) is an effective approach for models is relevant for an accurate process design but there
screening existing solvents and designing new ones. In CAMD, exist simpler criteria for assessing extractive distillation feasi-
pre-prepared molecular functional groups are assembled to bility, such as solvent capacity and selectivity,51 which are
generate potential solvents through mixed integrated linear further related to infinite dilution activity coefficients (IDAC),
programming (MILP) or mixed integrated non-linear program- and univolatility curves.52 In this study, we propose a mole-
ming (MINLP) or stochastic algorithms with objective func- cular multi-objective and multi-scale optimization framework
tions and constraints (such as molecular structural, property, for the combined molecular and process design with the pre-
and process operating constraints).34–37 However, with the dicted process constraints (solvent selectivity and capacity
increase in the number of preselected functional groups, the based on IDAC) where the process-related properties are
CAMD method may face the problem of combination directly used to train the molecular structure optimization
explosion.3,38 model, with the help of deep-learning techniques.
Recent advancements in the domain of artificial intelli-
gence have accelerated the development and application of
techniques for inverse molecular design.39–42 For instance, 3 The deep learning-based
molecular generation models have been applied in many
fields.38,43 Molecular graph generation techniques44 as an out- molecular multi-objective and multi-
standing representative have become one of the most widely scale optimization framework for a
adopted approaches for molecular design. Recently, a frag- green solvent design
ment-based hierarchical encoder–decoder model for molecular
generation was proposed by Jin et al.45 Fragments extracted The deep learning-based molecular multi-objective and multi-
from the training molecules were analogous to the molecular scale optimization framework for a green extraction distillation
functional groups used in the group-contribution methods. solvent design is presented by integrating an improved deep
The molecular fragments could integrate knowledge from the learning-based model for IDAC direct prediction (in Section 4)
chemistry domain interpretability into the model.46 The mole- and a data-driven deep molecular multi-objective optimization
cules can be optimized by predicting a disconnection site and model (in Section 5) as shown in Fig. 1. Meanwhile, the EH&S
performing molecular fragment addition or removal at that property constraints, process constraints, and energy con-
site. However, this model cannot simultaneously optimize sumption analysis are considered to ensure the sustainability
multiple trade-off properties of the solvent molecules. and technological economy of green solvents.

View Article Online

Fig. 1 The green solvent multi-objective and multi-scale optimization framework towards the extractive distillation processes.
The proposed framework for a green solvent design will be pounds. Heuristically, the increase in molecular weight of a
applied to an extractive distillation process to separate cyclo- solvent results in a higher normal boiling point, which usually
hexane and benzene mixtures (in Section 6). means higher energy consumption for an extractive distillation
process, and reduces its economic viability. Therefore, only
molecules with less than 12 root atoms (hydrogen atoms
ignored) are considered in this study. Additionally, as a col-
4 An improved deep learning model lected dataset, it is essential to apply data cleaning to remove
for direct prediction of the infinite outliers. The Pauta criterion,54 also known as the three Sigma
rule, is employed for the data cleaning process. After the data
dilution activity coefficient preparation process, 2130 compounds remain in the UD data-
4.1 Data preparation base. The quantum mechanically derived VCOSMO and σ-profile
The COSMO-SAC model22 is utilized in this study for the IDAC of the 2130 compounds (2125 compounds for model training
calculation. The UD database53 contains the quantum and 5 compounds for external testing) are provided in Table S1
mechanically derived VCOSMO and σ-profile for 2261 com- in the ESI.† The calculated IDAC of the 2130 compounds in

View Article Online
cyclohexane (A) and benzene (B) is detailed in Table S2 in the mainly focus on the local information about molecular struc-
ESI.† ture due to the message updating mechanism. Therefore, the
molecule level 200 dimensional RDKit calculated descriptors
4.2 Development of the deep learning model for IDAC direct (as shown in Fig. 3) that can capture the global information of
prediction the molecular structure are employed to integrate with the
MPNN learned features to form the molecular hybrid represen-
There are two paths to calculate the IDAC of a molecule in a tation, which can retain the molecular local and global infor-
certain solvent: (1) the VCOSMO and 51 σ-profile predictive mation as much as possible. The data split setting for training
models are trained, and then the IDACs of different com- the two proposed models is 0.8 : 0.1 : 0.1. The early stopping
pounds in a certain solvent are calculated utilizing the technique is employed to avoid overfitting. Finally, the 10-fold
COSMO-SAC with the estimated parameters as shown in cross-validation (10-fold CV) method is applied to improve the
Fig. 2a. This indirect IDAC calculation process can be termed stability of the two proposed models. In this study, the hidden
as an indirect method (IM) in this work; (2) the IDACs of size of MPNN, the depth of MPNN, the layer number of FNN,
different compounds in a certain solvent are directly calculated and the dropout of FNN are optimized using the Bayesian
by employing the COSMO-SAC, and then the calculated IDAC optimization method embedded in the Python package
information is utilized to train an IDAC predictive model as hyperopt.56
illustrated in Fig. 2b. This direct IDAC calculation path can be
termed a direct method (DM). The IM-based IDAC calculation
path has been introduced in our previous work.32,55 In this 4.3 The performance evaluation of the proposed deep
study, the DM-based IDAC predictive model is developed to learning prediction models
evaluate which IDAC calculation path performs better. The optimal hyperparameter combinations for the proposed
First, the IDACs of the 2130 compounds in benzene and IDAC-benzene and IDAC-cyclohexane predictive models are
cyclohexane are calculated using the COSMO-SAC model with summarized in Table 1.
their VCOSMO and σ-profile information from the UD database. In this study, three evaluating metrics, i.e. the mean absol-
Subsequently, the hybrid representations28,32 of the 2125 com- ute error (MAE), the mean squared error (MSE), and the coeffi-
pounds (five additional compounds are used as the external cient of determination (R2), were adopted as the evaluation cri-
validation data) are utilized as input to train the feedforward teria. The prediction performance of the IM and DM with the
neural network for the IDAC prediction in benzene and cyclo- UD database is summarized in Table 2. In addition to the FNN
hexane (IDAC-benzene and IDAC-cyclohexane) as shown in model, the prediction performance using random-forest and
Fig. 3. The message-passing neural network (MPNN) is a graph support-vector machine approaches is also summarized in
neural network, which consists of two phases, namely, the Table 2 to explore which machine learning approach is more
message-passing phase and the readout phase.28 In the suitable for IDAC prediction. The optimal hyperparameter
message-passing phase, the MPNN updates information on combinations of the random forest and support vector
the directed bonds, as shown in Fig. 3. In the readout phase, a machine-based approaches are detailed in Table S3.† Based on
readout function is utilized to provide a vector representation the statistical analysis, the FNN-based models (IM and DM
of the molecular structure. The MPNN learned descriptors models) had superior predictive performance over the random
Fig. 2 The schematic diagram for IDAC calculation. (a) An indirect method (IM). (b) A direct method (DM).

View Article Online

Fig. 3 The scheme diagram of the proposed IDAC direct method (DM) predictive models.
Table 1 The optimal hyperparameter combinations for the proposed In addition to the above-mentioned statistical analysis, five
prediction models of the IDAC-benzene and IDAC-cyclohexane molecules in the external validation dataset were utilized as
examples to evaluate the ability of the proposed predictive
Hyperparameters Range IDAC-benzene IDAC-cyclohexane
models to discriminate the stereoisomers and structural
Hidden size [300,3000] 1200 1300 isomers and to deal with complex molecules. In this work, the
Depth [2,7] 6 6 external validation dataset consisted of N,N-diethylaniline
Dropout [0,0.4] 0.0 0.0
Number of layers [1,5] 3 3 (complex compounds), P-xylene and O-xylene (structural
isomers), and cis-3-hexene and trans-3-hexene (cis/trans
isomers). The heteroatomic nitrogen in N,N-diethylaniline has
forest and support vector machine-based models. The per- an inducing effect on the delocalized π electron system of the
formance of the 10-fold CV of the proposed DM models for the aromatic ring, which could lead to a poor prediction perform-
IDACs in benzene and cyclohexane prediction on the test sets ance by some quantitative structure–property relationship
was better than that of the IM predictive model. (QSPR) models.31 o-Xylene and p-xyleneare a pair of structural
Table 2 The 10-fold cross-validation performance of the indirect method (IM) and direct method (DM) predictive models
10 CV MAE 10 CV MSE 10 CV R2
(a) IM32 (FNN-based model)

IDAC-benzene 0.1216 ± 0.0140 0.0720 ± 0.0163 0.8651 ± 0.0338
IDAC-cyclohexane 0.1755 ± 0.0180 0.1435 ± 0.0341 0.9123 ± 0.0198
(b) DM (FNN-based model)
IDAC-benzene 0.1146 ± 0.0108 0.0506 ± 0.0084 0.9036 ± 0.0128
(c) Random forest-based model
IDAC-benzene 0.2224 ± 0.0198 0.1581 ± 0.0221 0.6985 ± 0.0314
(d) Support vector machine-based model
IDAC-benzene 0.2516 ± 0.0164 0.1456 ± 0.0206 0.7213 ± 0.0389

View Article Online
Table 3 The prediction performance of the indirect method (IM) and direct method (DM) models on the external validation dataset
IDAC-cyclohexane IDAC-benzene
Compounds names QM derived values IM DM QM derived values IM DM
N,N-Diethylaniline 0.3215 0.2398 0.3028 −0.1047 −0.0745 −0.1033

p-Xylene 0.2230 0.2194 0.2295 0.0008 0.0109 −0.0004

o-Xylene 0.2654 0.2683 0.2472 0.0007 −0.0037 0.0014
cis-3-Hexene 0.0550 0.0505 0.0463 0.1860 0.1847 0.2072
trans-3-Hexene 0.0457 0.0475 0.0505 0.2325 0.2165 0.2183
isomers and trans-3-hexene and cis-3-hexene are a pair of cis/ very similar to the well-represented molecules in the training
trans isomers. The predictive performance of the IM and DM dataset. Moreover, the five external data were scattered in
models is tabulated in Table 3. Regarding N,N-diethylaniline, different regions of the chemistry space of the training dataset.
the proposed DM models can achieve a better predictive per- Therefore, the proposed DM models had decent IDAC predic-
formance than the IM models. Regarding the structural and tive performance and good generalization ability.
cis/trans isomers, both the IM and DM models have a satisfac- Based on the predictive performance analysis mentioned
tory ability to differentiate isomers. Additionally, we visualized above, the proposed DM models had a better generalization
the chemical space of the training and external dataset by pro- ability than the IM models. Additionally, the proposed IDAC
jecting the Morgan fingerprints (radius = 2, 1024 bits) of the prediction models can discriminate the isomers, including the
molecules onto the 2D space (as shown in Fig. 4) via the t-SNE isomers and cis/trans isomers, and can deal with complex com-
approach.57 As shown in Fig. 4, the five external data were not pounds such as hetero-atom compounds.
Fig. 4 The chemical space of the training and external dataset visualized via the t-SNE approach.

View Article Online
5 An interpretable molecular multi- learning the molecule pairs with improved selectivity and solu-
tion capacity.
objective optimization model learned In the hierarchical encoding process, a molecule can be
from pre-set molecule pairs represented by a hierarchical graph with three layers,45 i.e., an
5.1 Training data preparation for molecular multi-objective atom layer, an attachment layer, and a fragment layer, as seen
in Fig. 5a. The details of the fragment extraction approach and
optimization
the hierarchical encoding method were introduced in the
The ChEMBL dataset58 processed by Olivecrona59 was used to studies presented by Jin et al.45 and Chen et al.46 In the hier-
construct the molecule pairs, which were employed as the archical molecular representation framework, a molecule
training data for solvent molecular multi-objective optimiz- graph G can be represented as a set of fragments
ation. There were 1 179 477 compounds in the processed F 1 ; . . . and F n , and their attachments A1 ; . . . and An . Each
ChEMBL dataset. Each compound was restricted to contain attachment Ai in this layer denotes a specific attachment con-
10–50 root atoms and only had atoms in {H, B, C, N, O, F, Si, figuration of fragment F i , including the connection infor-
P, S, Cl, Br, and I}. The training molecule pairs were con- mation between F i and one of its neighbor fragments. In the
structed as follows. First, 18 155 molecules were identified atom layer, a molecule can be depicted as graph
from the processed ChEMBL dataset with root atoms of not Gx ¼ ðAx and Bx Þ, where Ax and Bx represent the atoms and
more than 12. We adopted the 12 root atom threshold because corresponding bonds in Mx. In the attachment layer, molecule
larger molecules usually have a higher normal boiling point, Gx is constituted by a series of fragments F 1 ; . . . ; F n extracted
and a molecule with a high normal boiling point is not suitable from the Mx. In the fragment layer, a molecule Mx is rep-
for use as a solvent to separate the benzene and cyclohexane via resented as a tree-constructed graph T x . The tree-constructed
extractive distillation. Second, C18 1552 = 164 792 935 molecule representation can be depicted as T x ¼ ðV x ; and E x Þ,44 where
pairs (Mx and My) were constructed from 18 155 processed mole- all the fragments in Mx are extracted as nodes in V x ; nodes
cules. Third, 1 590 350 molecule pairs had similarities, sim(Mx, with the same atoms are connected with edges in E x . The
My) ≥ 0.4. The similarities of the molecule pairs can be measured encoder encodes the molecule pairs (Mx, My) as graph (Gx and
by the Tanimoto coefficient over 2048-dimension binary Morgan Gy ) using message passing networks, and as a tree-constructed
fingerprints with radius 1. The similarity threshold was adopted graph (T x and T y ) using tree message passing networks.
because the proposed molecular optimization model needed the In the hierarchical decoding process, the decoder conducts
training molecule pairs with only one fragment different at one a series of modified operations that optimize Mx into My, as
disconnection site, which can improve the learning efficiency of seen in Fig. 5b. The details of the hierarchical decoding
the molecular optimization model. Fourth, the DF-GED algorithm method are introduced in the studies presented by Jin et al.45
was used to extract molecule pairs that had only one fragment and Chen et al.46 First, the decoder performs disconnection
different at one disconnection site, which can improve the learn- attachment prediction (DAP) to find an attachment Ad in T x
ing efficiency of the molecular multi-objective optimization as the disconnection site. Second, at the neighbors of Ad , the
model. 100 629 molecule pairs were extracted from 1 590 350 decoder performs fragment-removing prediction (FRP) to
pairs of molecules. Fifth, among the 100 629 pairs of molecules, remove fragments attached to Ad . Third, an intermediate
rep-
we selected the molecule pairs that met the following property resentation (IMR) for the remaining scaffold G*ð0Þ ; T *ð0Þ is
constraints: for selectivity, the selectivity score of My should be

produced the fragment removal operation. Fourth, over
after
improved by at least 20% compared with Mx in a molecule pair, G*ð0Þ ; T *ð0Þ , the decoder conducts new fragment attachment
that is, (NFA) prediction iteratively to optimize Mx into My. The
SðMy Þ SðMx Þ optimal graph edit paths can be identified by the DF-GED
0:2; ð1Þ algorithm.60
SðMx Þ
By learning from the selectivity and solution capacity of
and the solution capacity score of My should also be improved improved molecule pairs (training molecule pairs), the hier-
by at least 20% compared with Mx in a molecule pair, that is, archical molecular multi-objective optimization model can
CðMy Þ CðMx Þ realize the multi-objective optimization of the solvent mole-
0:2; ð2Þ cules as illustrated in Fig. 5c.
CðMx Þ
As a result, 35 496 molecule pairs (detailed in Table S4 in

the ESI†) were identified that can be used as the training data
with the property constraints. 6 Case study of the green solvent
multi-objective and multi-scale
5.2 Development of the multi-objective optimization model
of solvent molecules optimization framework
The proposed multi-objective molecular optimization A case study using extraction distillation to separate aliphatic
approach of the solvent molecules extended the hierarchical and aromatic mixtures61 was used to evaluate the proposed
generation model to multi-objective molecular optimization by green solvent multi-objective optimization framework. In this

View Article Online

Fig. 5 The schematic diagram of the hierarchical molecule (a) encoder, (b) decoder, and (c) multi-objective optimization process.
work, the aromatic/aliphatic mixtures were simplified as mix- Table 4 Five commonly utilized extractive distillation solvents for sep-
tures of cyclohexane (A)/benzene (B).62 The green extractive dis- arating benzene from the mixtures of benzene/cyclohexane as inputs of
the molecular multi-objective optimization model
tillation solvent multi-objective optimization framework can
be decomposed into three steps, i.e., molecular multi-objective
Names Structure Drawbacks Ref.
optimization, property constraints, and process constraints, as
introduced in Section 2. Furfural Toxicity 61
Sulfolane Ecological hazard 63

6.1 The extractive distillation solvent multi-objective
optimization DMSO Toxicity 64
In this step, as the inputs of the molecular multi-objective DMF Ecological hazard 65
optimization model, industrial extractive solvent molecules
that need to be optimized should be first identified. Five NMP Ecological hazard 66
widely employed extractive distillation solvents for separating
the mixtures of cyclohexane (A) and benzene (B) are listed in
Table 4 based on extensive literature research.61,63–66 However,
all these solvents have some drawbacks, such as toxicity or eco-
cular multi-objective optimization model introduced in
logical hazard. The toxicity and ecological information of
Section 4. Accordingly, 100 optimized solvent molecules are
these solvents (with experimentally measured or predicted pro-
generated as tabulated in Table S5 in the ESI.†
perties) is available in the Syntelly database.67
Taking the five common industrial solvent molecules as
inputs of the molecular multi-objective optimization model, 6.2 EH&S property constraints
20 optimized solvent molecules are generated for every single In this step, the 100 optimized solvent molecules are screened
widely used solvent (as seen in Fig. 4a–e) via the trained mole- using EH&S properties. In terms of environmental properties,

View Article Online
three ecological indicators are taken into account, i.e., the 6.3 Process constraints
bioconcentration factor, 40 hours of Tetrahymena pyriformis
IGC50, and 48 hours of Daphnia magna LC50. If a solvent In this step, process operation conditions are quantified by
negatively affects the environment, it will be marked in red, normal melting point and normal boiling point. The melting
as shown in Fig. 6. The health properties can be quantified point of the solvent should be below 310 K (ref. 36) to ensure
by the rat oral dosage. The threshold value for toxicity is that it is in the liquid state at the operating temperature. The
2000 mg kg−1. If the rat oral dosage of a given solvent is boiling point of the solvent should be below 580 K (ref. 36) for
500 mg kg−1, it will negatively affect health and will be relatively economical separation energy consumption. There
marked in red, as shown in Fig. 6. Safety can be quantified are 3 solvent molecules (i.e., a18, a19, and d19, whose names
using the flash point. For a given solvent, the higher its are 4-methyl furfural, 5-methyl furfural, and 2-hexanone,
flash point, the better for storage security. In this work, if respectively) remaining after the operation condition screen-
the flash point is above 280 K,36 it will positively impact the ing. Detailed information on the normal melting point and
storage security and it will be marked in green, as shown in boiling point of the 10 solvent molecules after EH&S property
Fig. 6. All the EH&S information can be collected from the screening is provided in Table 5.
Syntelly database.67 As a result, 10 solvent molecules remain To further screen solvents that would make the extractive
screened by EH&S properties constraints and are displayed distillation process feasible, the residue curve analyses of the 3
in Table 5. screened solvents were conducted and the results are shown in
Fig. 6 EH&S property information on the 100 optimized solvent molecules with the proposed molecular multi-objective optimization model. Tox,
Eco, and FP are the abbreviations for toxicity, ecology, and flash point. The green color denotes positive (or good) properties for EH&S constraints,
and red means negative. The red dotted boxes mark all solvent molecules with three positive EH&S properties.

View Article Online
Table 5 The melting point and boiling point information of the 10 tive distillation column, then by recovering benzene as a distil-
solvent molecules after EH&S properties screening late from the regeneration column where a high-purity solvent
is obtained at the bottom and then recycled to the extraction
Melting Boiling
Namesa Smiles Structure point/K point/K distillation column. The intersection point xp of the isovolati-
lity curve with the triangle edge largely determines the
a17 OvCc1ccc(O)c(O)c1 413.15 550.15
minimum usage of the solvent.52,69,70 The lower the xp, the
a18 Cc1coc(CvO)c1 303.15 455.15 less the amount of solvent required. As we can see, the mole
amount of 2-hexanone used is more than that of 4-methyl fur-
a19 Cc1ccc(CvO)o1 293.15 460.15
fural and 5-methyl furfural. The results of the combined
b2 CSc1cc(vO)[nH]c 523.15 578.15 residue curve and univolatility analyses can further prove that
(vO)[nH]1 the proposed IDAC predictive models can achieve reliable and
b15 OvS1(vO) 473.15 564.15 accurate prediction performance.
CvC2NCNC2C1
b20 OvS1(vO) 473.15 547.15
CvC2NCvNC2C1 6.4 Energy consumption analysis
d2 OvC(O)CCC(vO)O 461.15 546.15 The energy of the extraction column (QE) and regeneration
d19 CCCCC(C)vO 217.65 400.75 column (QR) of the five widely employed solvents and three
e7 OvC(O)CC1CCC 430.15 570.15 candidate green solvents are summarized in Table 6. The
(vO)N1 detailed operation conditions of the eight solvents are tabu-
e16 OvC(O)CC1CC(CS) 430.15 559.15 lated in Table S6.†
NC1vO
Additionally, the information on the rat oral and bioconcen-
a
The names correspond to the serial numbers in Fig. 6. tration factor is tabulated in Table 6. The results indicate that
there is a trade-off between energy consumption and sustain-
able performance (such as EH&S properties), where a decrease
Fig. 7. According to a review by Gerbaud et al.,52 the combined in energy consumption usually comes at the expense of sus-
analysis of residual curve maps (RC) and univolatility line can tainability. The toxicity of 4-methyl furfural and 5-methyl fur-
help evaluate whether a solvent is suitable formixture separ- fural is reduced by about 95% compared with furfural. The
ation via extractive distillation, or not.68 As illustrated in the bioconcentration factor of 2-hexanone is reduced by about
RC maps, every single curve originates from the azeotrope 62% compared with DMF. Policies worldwide are moving the
point and terminates in the pure component. Additionally, application of chemical separation processes in the direction
there is one distillation region for each of the three RC maps. of green chemistry.6 It is worth noting that the reboiler temp-
In the residue curve map, A or B is a saddle point of the distil- erature of the extraction and regeneration columns of 2-hexa-
lation region and cannot be obtained by azeotropic distillation. none is lower than 150 °C. However, the reboiler temperatures
On the other hand, the univolatility line splits the ternary of the extraction and regeneration column of 4-methyl furfural
diagram into two volatility order regions for all three solvents. and 5-methyl furfural are both higher than 150 °C. This means
With the feeding of the solvent at another location than the that the reboiler using 2-hexanone can use low pressure steam
main feed, the extractive distillation process enables the most while the reboiler using the other two solvents needs to use
volatile component in the volatility order regions to be medium pressure steam.
obtained where the solvent is found.52 This is the case for
cyclohexane with the 3 green candidate solvents. Therefore, it 6.5 Analysis based on knowledge of the chemistry domain
is possible to separate the benzene/cyclohexane mixtures as To make a more intuitive observation, the optimization pro-
pure products, first by removing cyclohexane from the extrac- cesses of the three candidate green solvents are shown in
Fig. 7 The residue curve maps of (1) 4-methyl furfural, (2) 5-methyl furfural, and (3) 2-hexanone in the cyclohexane (A)/benzene (B) mixtures.

View Article Online
Table 6 The reboiler heat duties of the extraction column (QE) and regeneration column (QR) based on five widely used extractive solvents and
three candidate green solvents
Names Structure QE (kW) QR (kW) QE + QR (kW) Rat oral LD50 (mg kg−1) Bioconcentration factor (L kg−1)
Furfural 919.22 1084.73 2003.96 129 28 500

Sulfolane 339.56 2217.88 2557.45 3202 51 000
DMSO 1309.15 1178.30 2487.45 1820 48 900
DMF 793.33 1804.74 2598.07 2964 93 400
NMP 1360.86 1289.77 2650.63 4254 82 100
4-Methyl furfural 1374.95 1319.23 2694.18 2404 23 500
5-Methyl furfural 1365.32 1276.51 2641.82 2405 25 300
2-Hexanone 1490.44 1953.50 3443.94 2490 35 300
The total stages of the extractive and regeneration columns are 50 and 40, respectively. The higher the rat oral value of a solvent indicates a
higher toxicity. The higher bioconcentration factor of a solvent indicates a greater harm to the ecology.
Fig. 8 Visualization of the optimization processes of (a) furfural to 4-methyl furfural, (b) furfural to 5-methyl furfural, and (c) DMF to 2-hexanone.
The asterisks (*) represent the new fragment attachment (NFA) sites.
Fig. 8. Among the three solvents, 4-methyl furfural and green solvents will be identified if more molecules are opti-
5-methyl furfural are the derivatives of furfural. Interestingly, mized and generated for every widely used solvent.
the branching of methyl to the furan ring could significantly
reduce the toxicity of furfural. This could be due to the steric 6.6 Molecular fragment analysis
effect resulting from the aromatic ring substitution. The oral To further explore the relationship between the molecular frag-
dosages of 4-methyl furfural, 5-methyl furfural, and furfural to ments and the optimization processes, the fragments were
rats are 2404, 2405, and 129 mg kg−1 (the higher the better), first extracted from the prepared training molecule pairs
respectively. 2-Hexanone is obtained by optimizing the struc- shown in Table S4 in the ESI.† The IDACs of these fragments
ture of DMF. The dialkylation of the carbonyl carbon in DMF in benzene and cyclohexane are predicted by the proposed
can not only improve the selectivity and solution capacity but IDAC perdition models. The selectivity and solution capacity of
also reduce the ecological hazards. This is because the amide these fragments are calculated based on the predicted IDACs
in DMF plays a very pivotal role in the growth and metabolism of these fragments. The detailed information on these frag-
of microorganisms and can ensure that microbes get enough ments is tabulated in Table S7 in the ESI.† The results of the
protein and other important metabolites, thus promoting selectivity and solution capacity of these fragments are shown
their growth and reproduction, which could have a negative in Fig. 9. In this figure, molecular fragments with selectivity
impact on the environment. greater than 3 and solution capacity greater than 0.6 are
In summary, 4-methyl furfural, 5-methyl furfural, and marked in red. To more intuitively explore the common charac-
2-hexanone can be used as candidate green solvents to isolate teristics between the molecular fragments, the molecular
mixtures of cyclohexane and benzene with extractive distilla- structures of the fragments marked in red are shown in Fig. 9.
tion. In this study, to evaluate the validity of the green solvent As we can see, most of these visualized molecular fragments
multi-objective optimization framework, only 20 molecules are heteroatom-containing aromatic compounds. From the
were generated from every widely used solvent. More candidate optimization results shown in Fig. 6, we can also find that

View Article Online

Fig. 9 Visualization of the molecular fragment information of selectivity and solution capacity.
many optimized molecules are modified with these molecular fragment addition or removal. Every pair of molecules in the pre-
fragments. However, these fragments can easily lead to toxicity set molecule pairs had similar molecular structures, but the
and ecological hazards. Therefore, there appears to be a trade- scores of both selectivity and solution capacity of My were at least
off between the separation performance (such as selectivity 20% larger than those of Mx. To prepare the molecule pairs, an
and solution capacity) and sustainable performance (such as improved deep learning-based IDAC direct prediction model
EH&S properties) of the solvents. In this study, the proposed trained over a COSMO-SAC database was developed for calculating
green solvent design framework can efficiently balance the the selectivity and solution capacity of the molecule pairs. The
trade-off between the separation performance and sustainable IDAC direct predictive model with the ability to discriminate
performance of the solvents and find green solvents with stereoisomers achieved a better prediction performance than the
multi-constraints. IDAC indirect predictive model. As a result, 35 496 molecule pairs
were identified that can be used as training data to train the deep
hierarchical molecular multi-objective optimization model.
7 Conclusions Finally, the proposed IDAC prediction model and molecular
multi-objective optimization model were integrated into a green
In this study, we propose a molecular multi-objective and solvent multi-objective and multi-scale optimization framework
multi-scale optimization framework for the design of green with EH&S properties and process constraints.
solvents fit for extractive distillation that can simultaneously The proposed green solvent multi-objective and multi-scale
optimize multiple trade-off properties such as selectivity and optimization framework was applied to an extractive distillation
solution capacity, both related to molecular and process con- process to separate the mixtures of cyclohexane and benzene. The
straints. The molecular multi-objective optimization model results showed that 4-methyl furfural, 5-methyl furfural, and
relies upon its ability to optimize process properties rather 2-hexanone can be utilized as candidate green solvents. Among
than molecular properties, as in common computer-aided the three solvents, 4-methyl furfural and 5-methyl furfural are
molecular design approaches. The process properties are derivatives of furfural. Interestingly, the branching of methyl to
short-cut properties of the extractive distillation process, the furan ring could significantly reduce the toxicity of furfural.
namely selectivity and solution capacity, which are evaluated This could be due to the steric effect resulting from the aromatic
via infinite dilution activity coefficients (IDAC). ring substitution. 2-Hexanone was obtained by optimizing the
A deep hierarchical molecular multi-objective optimization structure of DMF. The dialkylation of the carbonyl carbon in
model was developed to learn the optimization path from our DMF can not only improve the selectivity and solution capacity
pre-set molecule pairs (Mx and My) and generate new solvents by but also reduce the ecological hazards. This is because amide

View Article Online
compounds play a very important role in the growth and metab- Molecular reconstruction, sustainable solvent design and
olism of microorganisms and help microbes get enough protein multiscale process optimization, Fuel, 2023, 334, 126651.
and other important metabolites, thus promoting their growth 3 S. Chai, Z. Song, T. Zhou, L. Zhang and Z. Qi, Computer-
and reproduction, which could have a negative impact on the aided molecular design of solvents for chemical separation
environment. processes, Curr. Opin. Chem. Eng., 2022, 35, 100732.
4 A. Doolin, R. G. Charles, C. D. Castro, R. G. Rodriguez and
M. L. Davies, Sustainable solvent selection for the manufac-

Author contributions ture of methylammonium lead triiodide (MAPbI 3) perovs-
kite solar cells, Green Chem., 2021, 23, 2471–2486.
Jun Zhang: conceptualization (lead), data curation (lead), formal 5 J. H. Clark, Green chemistry: Challenges and opportunities,
analysis (lead), methodology (lead), software (lead), validation Green Chem., 1999, 1, 1–8.
(lead), writing – original draft (lead), and writing – review and 6 J. H. Clark, Green chemistry: Today (and tomorrow), Green
editing (equal). Qin Wang: conceptualization (equal), funding Chem., 2006, 8, 17–21.
acquisition (equal), methodology (equal), project administration 7 J. Y. Ten, Z. H. Liew, X. Y. Oh, M. H. Hassim and
(equal), supervision (equal), and writing – review and editing N. Chemmangattuvalappil, Computer-aided molecular
(equal). Huaqiang Wen: formal analysis (equal), methodology design of optimal sustainable solvent for liquid–liquid
(equal), software (equal), and validation (equal). Vincent extraction, Process Integr. Optim. Sustain, 2021, 5, 269–284.
Gerbaud: conceptualization (equal), methodology (equal), and 8 Y. S. Lee, A. Galindo, G. Jackson and C. S. Adjiman,
writing – review and editing (equal). Saimeng Jin: conceptualiz- Enabling the direct solution of challenging computer-aided
ation (equal), methodology (equal), and writing – review and molecular and process design problems: Chemical absorp-
editing (equal). Weifeng Shen: conceptualization (equal), funding tion of carbon dioxide, Comput. Chem. Eng., 2023, 174,
acquisition (lead), methodology (equal), project administration 108204.
(lead), supervision (lead), writing – original draft (equal), and 9 I. Rodriguez-Donis, S. Thiebaud-Roux, S. Lavoine and
writing – review and editing (lead). V. Gerbaud, Computer-aided product design of alternative
solvents based on phase equilibrium synergism in mix-
tures, C. R. Chim., 2018, 21, 606–621.
Data availability 10 M. Korichi, V. Gerbaud, P. Floquet, A. H. Meniai, S. Nacef
and X. Joulia, Computer aided aroma design I – Molecular
The data that support the findings of this study are available in
knowledge framework, Chem. Eng. Process., 2008, 47, 1902–
the ESI of this article on https://zenodo.org/records/10097726.†
1911.
11 H. Sun, A universal molecular descriptor system for predic-
tion of logP, logS, logBB, and absorption, J. Chem. Inf.
Conflicts of interest Comput. Sci., 2004, 44, 748–757.
12 A. Fredenslund, R. L. Jones and J. M. Prausnitz, Group-con-
There are no conflicts of interest to declare.
tribution estimation of activity coefficients in nonideal
liquid mixtures, AIChE J., 1975, 21, 1086–1099.
13 T. J. Sheldon, M. Folić and C. S. Adjiman, Solvent design
Acknowledgements using a quantum mechanical continuum solvation model,
The authors acknowledge the financial support provided by Ind. Eng. Chem. Res., 2006, 45, 1128–1140.
the National Natural Science Foundation for Excellent Young 14 J. G. Rittig, K. B. Hicham, A. M. Schweidtmann,
Scientists of China (No. 22122802); the National Natural M. Dahmen and A. Mitsos, Graph neural networks for
Science Foundation of China (No. 22278044); the Chongqing temperature-dependent activity coefficient prediction of
Science Foundation for Distinguished Young Scholars (No. solutes in ionic liquids, Comput. Chem. Eng., 2023, 171,
CSTB2022NSCQ-JQX0021); the Chongqing Innovation Support 108153.
Key Program for Returned Overseas Chinese Scholars (No. 15 Z. Wang, Y. Su, S. Jin, X. Zhang and J. H. Clark, A novel
cx2023002); and the Research Foundation of Chongqing unambiguous strategy of molecular feature extraction in
University of Science and Technology (No. ckrc2019006). machine learning assisted predictive models for environ-
mental properties, Green Chem., 2020, 22, 3867–3876.
16 Z. Wang, Y. Su, W. Shen, S. Jin, J. H. Clark, J. Ren and
References X. Zhang, Predictive deep learning models for environ-
mental properties: the direct calculation of octanol–water
1 J. C. Fromer and C. W. Coley, Computer-aided multi-objec- partition coefficients from molecular graphs, Green Chem.,
tive optimization in small molecule discovery, Patterns, 2019, 21, 4555–4565.
2023, 4, 100678. 17 T. Zhou, K. McBride, S. Linke, Z. Song and K. Sundmacher,
2 X. C. Ma, Q. Zhang, C. He, Q. L. Chen and B. J. Zhang, Computer-aided solvent selection and design for efficient
Computer-aided naphtha liquid–liquid extraction: chemical processes, Comput. Chem. Eng., 2020, 27, 35–44.

View Article Online
18 R. Gani, Group contribution-based property estimation 33 R. Gani and E. Brignole, Molecular design of solvents for
methods: advances and perspectives, Curr. Opin. Chem. liquid extraction based on UNIFAC, Fluid Phase Equilib.,
Eng., 2019, 23, 184–196. 1983, 13, 331–340.
19 F. Eckert and A. Klamt, Fast solvent screening via quantum 34 T. Zhou, Z. Song, X. Zhang, R. Gani and K. Sundmacher,
chemistry: COSMO–RS approach, AIChE J., 2002, 48, 369– Optimal solvent design for extractive distillation processes:
385. A multiobjective optimization-based hierarchical frame-
20 A. Klamt and F. Eckert, COSMO-RS: A novel and efficient work, Ind. Eng. Chem. Res., 2019, 58, 5777–5786.
method for the a priori prediction of thermophysical data 35 L. Zhang, J. Pang, Y. Zhuang, L. Liu, J. Du and Z. Yuan,
of liquids, Fluid Phase Equilib., 2000, 172, 43–72. Integrated solvent-process design methodology based on
21 S.-T. Lin, Quantum mechanical approaches to the prediction COSMO-SAC and quantum mechanics for TMQ (2,2,4-tri-
of phase equilibria: solvation thermodynamics and group con- methyl-1,2-H-dihydroquinoline) production, Chem. Eng.
tribution methods, University of Delaware, 2001. Sci., 2020, 226, 115894.
22 S.-T. Lin and S. I. Sandler, A priori phase equilibrium pre- 36 S. Chai, E. Li, L. Zhang, J. Du and Q. Meng, Crystallization
diction from a segment contribution solvation model, Ind. solvent design based on a new quantitative prediction
Eng. Chem. Res., 2002, 41, 899–913. model of crystal morphology, AIChE J., 2021, e17499.
23 I. H. Bel, E. Mickoleit, C.-M. Hsieh, S.-T. Lin, J. Vrabec, 37 J. Heintz, J. P. Belaud, N. Pandya, M. T. D. Santos and
C. Breitkopf and A. Jäger, A benchmark open-source V. Gerbaud, Computer aided product design tool for sus-
implementation of COSMO-SAC, J. Chem. Theory Comput., tainable product development, Comput. Chem. Eng., 2014,
2020, 16, 2635–2646. 71, 362–376.
24 Q. Liu, L. Zhang, K. Tang, L. Liu, J. Du, Q. Meng and 38 A. S. Alshehri, R. Gani and F. You, Deep learning and
R. Gani, Machine learning-based atom contribution knowledge-based methods for computer-aided molecular
method for the prediction of surface charge density pro- design—toward a unified approach: State-of-the-art and
files and solvent design, AIChE J., 2021, 67, e17110. future directions, Comput. Chem. Eng., 2020, 141, 107005.
25 T. Mu, J. Rarey and J. Gmehling, Group contribution pre- 39 A. Graves, Generating sequences with recurrent neural net-
diction of surface charge density distribution of molecules works, arXiv, 2013, preprint, arXiv:1308.0850.
for COSMO-SAC, AIChE J., 2009, 55, 3298–3300. 40 V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves,
26 E. Mullins, R. Oldland, Y. A. Liu, S. Wang, S. I. Sandle, I. Antonoglou, D. Wierstra and M. Riedmiller, Playing atari
C.-C. Chen, M. Zwolak and K. C. Seavey, Sigma-profile data- with deep reinforcement learning, arXiv, 2013, preprint,
base for using COSMO-based thermodynamic methods, arXiv:1312.5602.
Ind. Eng. Chem. Res., 2006, 45, 4389–4415. 41 D. P. Kingma and M. Welling, Auto-encoding variational
27 Y. Su, Z. Wang, S. Jin, W. Shen, J. Ren and M. R. Eden, An bayes, arXiv, 2013, preprint, arXiv:1312.6114.
architecture of deep learning in QSPR modeling for the 42 B. Sanchez-Lengeling and A. Aspuru-Guzik, Inverse mole-
prediction of critical properties using molecular signatures, cular design using machine learning: Generative models
AIChE J., 2019, 65, e16678. for matter engineering, Science, 2018, 361, 360–365.
28 J. Zhang, Q. Wang, Y. Su, S. Jin, J. Ren, M. Eden and 43 A. S. Alshehri and F. You, Deep learning to catalyze inverse
W. Shen, An accurate and interpretable deep learning molecular design, Chem. Eng. J., 2022, 444, 136669.
model for environmental properties prediction using 44 W. Jin, R. Barzilay and T. Jaakkola, Junction tree variational
hybrid molecular representations, AIChE J., 2022, 68, autoencoder for molecular graph generation, arXiv, 2018,
e17634. preprint, arXiv:1802.04364, arXiv.org e-Print archive.
29 F. Jirasek, R. A. Alves, J. Damay, R. A. Vandermeulen, 45 W. Jin, R. Barzilay and T. Jaakkola, Hierarchical generation
R. Bamler, M. Bortz, S. Mandt, M. Kloft and H. Hasse, of molecular graphs using structural motifs, arXiv, 2020,
Machine learning in thermodynamics: Prediction of preprint, arXiv:2002.03230v2.
activity coefficients by matrix completion, J. Phys. Chem. 46 Z. Chen, M. R. Min, S. Parthasarathy and X. Ning, A deep
Lett., 2020, 11, 981–985. generative model for molecule optimization via one frag-
30 G. Chen, Z. Song, Z. Qi and K. Sundmacher, Neural recom- ment modification, Nat. Mach. Intell., 2021, 3, 1040–1049.
mender system for the activity coefficient prediction and 47 J. Wang, C.-Y. Hsieh, M. Wang, X. Wang, Z. Wu, D. Jiang,
UNIFAC model extension of ionic liquid–solute systems, B. Liao, X. Zhang, B. Yang and Q. He, Multi-constraint
AIChE J., 2021, 67, e17171. molecular generation based on conditional transformer,
31 G. Chen, Z. Song and Z. Qi, Transformer-convolutional knowledge distillation and reinforcement learning, Nat.
neural network for surface charge density profile predic- Mach. Intell., 2021, 3, 914–922.
tion: Enabling high-throughput solvent screening with 48 D. Polykovskiy, A. Zhebrak, B. Sanchez-Lengeling,
COSMO-SAC, Chem. Eng. Sci., 2021, 246, 117002. S. Golovanov and A. Zhavoronkov, Molecular sets (MOSES):
32 J. Zhang, Q. Wang and W. Shen, Message-passing neural A benchmarking platform for molecular generation
network based multi-task deep-learning framework for models, Front. Pharmacol., 2020, 11, 565644.
COSMO-SAC based σ-profile and VCOSMO prediction, 49 J. Scheffczyk, P. Schäfer, L. Fleitmann, J. Thien,
Chem. Eng. Sci., 2022, 254, 117624. C. Redepenning, K. Leonhard, W. Marquardt and

View Article Online
A. Bardow, COSMO-CAMPD: A framework for integrated 60 Z. Abu-Aisheh, R. Raveaux, J. Y. Ramel and P. Martineau,
design of molecules and processes based on COSMO-RS, An exact graph edit distance algorithm for solving pattern
Mol. Syst. Des. Eng., 2018, 3, 645–657. recognition problems, International Conference on Pattern
50 L. Polte, L. Raßpe-Lange, F. Latz, A. Jupke and Recognition Applications & Methods, PRT, Setubal, 2015, 1,
K. Leonhard, COSMO-CAMPED–solvent design for an 271–278.
extraction distillation considering molecular, process, 61 L. Sun, Q. Wang, L. Li, J. Zhai and Y. Liu, Design and
equipment, and economic optimization, Chem. Ing. Tech., control of extractive dividing wall column for separating
2023, 95, 416–426. benzene/cyclohexane mixtures, Ind. Eng. Chem. Res., 2014,
51 S. Kossack, K. Kraemer, R. Gani and W. Marquardt, A sys- 53, 8120–8131.
tematic synthesis framework for extractive distillation pro- 62 Q. Wang, J. Y. Chen, M. Pan, C. He, C. C. He, B. J. Zhang
cesses, Chem. Eng. Res. Des., 2008, 86, 781–792. and Q. L. Chen, A new sulfolane aromatic extractive distilla-
52 V. Gerbaud, I. Rodriguez-Donis, L. Hegely, P. Lang, tion process and optimization for better energy utilization,
F. Denes and X. Q. You, Review of extractive distillation. Chem. Eng. Process., 2018, 128, 80–95.
Process design, operation, optimization and control, Chem. 63 L. Li, Y. Tu, L. Sun, Y. Hou, M. Zhu, L. Guo, Q. Li and
Eng. Res. Des., 2019, 141, 229–271. Y. Tian, Enhanced efficient extractive distillation by com-
53 R. Fingerhut, W.-L. Chen, A. Schedemann, W. Cordes, bining heat-integrated technology and intermediate
J. r. Rarey, C.-M. Hsieh, J. Vrabec and S.-T. Lin, heating, Ind. Eng. Chem. Res., 2016, 55, 8837–8847.
Comprehensive assessment of COSMO-SAC models for pre- 64 F. M. Lee, Use of organic sulfones as the extractive distilla-
dictions of fluid-phase equilibria, Ind. Eng. Chem. Res., tion solvent for aromatics recovery, Ind. Eng. Chem. Process.
2017, 56, 9868–9884. Des. Dev., 1986, 25, 949–957.
54 L. Li, Z. Wen and Z. Wang, Outlier detection and correction 65 M. K. Praharaj, A. Satapathy, P. Mishra and S. Mishra,
during the process of groundwater lever monitoring base Ultrasonic analysis of intermolecular interaction in the
on Pauta criterion with self-learning and smooth proces- mixtures of benzene with N,N-dimethylformamide and
sing, in Theory, Methodology, Tools and Applications for cyclohexane at different temperatures, J. Chem. Pharm. Res.,
Modeling and Simulation of Complex Systems, Springer, 2013, 5, 49–56.
Singapore, 2016. 66 C. Yang, Z. Liu, H. Lai and P. Ma, Thermodynamic properties
55 J. Zhang, Q. Wang, M. Eden and W. Shen, A deep learning- of binary mixtures of N-methyl-2-pyrrolidinone with cyclo-
based framework towards inverse green solvent design for hexane, benzene, toluene at (303.15 to 353.15) K and atmos-
extractive distillation with multi-index constraints, Comput. pheric pressure, J. Chem. Thermodyn., 2007, 39, 28–38.
Chem. Eng., 2023, 177, 108335. 67 Syntelly: Better than chemists can do., https://syntelly.com,
56 J. Zhang, Q. Wang and W. Shen, Hyper-parameter optimiz- (accessed 11 Sep., 2023).
ation of multiple machine learning algorithms for mole- 68 W. Shen, L. Dong, S. Wei, J. Li, H. Benyounes, X. You and
cular property prediction using hyperopt library, V. Gerbaud, Systematic design of an extractive distillation
Chin. J. Chem. Eng., 2022, 52, 115–125. for maximum-boiling azeotropes with heavy entrainers,
57 D. S. Karlov, S. Sosnin, I. V. Tetko and M. V. Fedorov, AIChE J., 2015, 61, 3898–3910.
Chemical space exploration guided by deep neural net- 69 J. Gu, X. You, C. Tao, L. Jun and G. Vincent, Energy-saving
works, RSC Adv., 2019, 5151–5157. reduced-pressure extractive distillation with heat inte-
58 A. Gaulton, A. Hersey, M. Nowotka, A. P. Bento, gration for separating the biazeotropic ternary mixture
J. Chambers, D. Mendez, P. Mutowo, F. Atkinson, tetrahydrofuran–methanol–water, Ind. Eng. Chem. Res.,
L. J. Bellis and E. Cibrián-Uhalte, The ChEMBL database in 2018, 57, 13498–13510.
2017, Nucleic Acids Res., 2017, 45, D945–D954. 70 A. Yang, W. Shen, S. A. Wei, L. Dong, J. Li and V. Gerbaud,
59 M. Olivecrona, T. Blaschke, O. Engkvist and H. Chen, Design and control of pressure-swing distillation for separ-
Molecular de-novo design through deep reinforcement ating ternary systems with three binary minimum azeo-
learning, J. Cheminf., 2017, 9, 1–14. tropes, AIChE J., 2019, 65, 1281–1293.

Green Chemistry

Uploaded by

Copyright:

Available Formats

You might also like

Green Chemistry

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Green Chemistry

Uploaded by

Copyright:

Available Formats

Green Chemistry

View Article Online

Multi-objective optimization strategy for green

and Weifeng Shen *a

This journal is © The Royal Society of Chemistry 2023 Green Chem.

Paper Green Chemistry

Green Chem. This journal is © The Royal Society of Chemistry 2023

Green Chemistry Paper

This journal is © The Royal Society of Chemistry 2023 Green Chem.

Paper Green Chemistry

Green Chem. This journal is © The Royal Society of Chemistry 2023

Green Chemistry Paper

This journal is © The Royal Society of Chemistry 2023 Green Chem.

Paper Green Chemistry

(a) IM32 (FNN-based model)

Green Chem. This journal is © The Royal Society of Chemistry 2023

Green Chemistry Paper

Compounds names QM derived values IM DM QM derived values IM DM

N,N-Diethylaniline 0.3215 0.2398 0.3028 −0.1047 −0.0745 −0.1033

p-Xylene 0.2230 0.2194 0.2295 0.0008 0.0109 −0.0004

This journal is © The Royal Society of Chemistry 2023 Green Chem.

Paper Green Chemistry

As a result, 35 496 molecule pairs (detailed in Table S4 in

Green Chem. This journal is © The Royal Society of Chemistry 2023

Green Chemistry Paper

Sulfolane Ecological hazard 63

This journal is © The Royal Society of Chemistry 2023 Green Chem.

Paper Green Chemistry

Green Chem. This journal is © The Royal Society of Chemistry 2023

Green Chemistry Paper

This journal is © The Royal Society of Chemistry 2023 Green Chem.

Paper Green Chemistry

Furfural 919.22 1084.73 2003.96 129 28 500

Sulfolane 339.56 2217.88 2557.45 3202 51 000

DMSO 1309.15 1178.30 2487.45 1820 48 900

DMF 793.33 1804.74 2598.07 2964 93 400

NMP 1360.86 1289.77 2650.63 4254 82 100

4-Methyl furfural 1374.95 1319.23 2694.18 2404 23 500

5-Methyl furfural 1365.32 1276.51 2641.82 2405 25 300

2-Hexanone 1490.44 1953.50 3443.94 2490 35 300

Green Chem. This journal is © The Royal Society of Chemistry 2023

Green Chemistry Paper

This journal is © The Royal Society of Chemistry 2023 Green Chem.

Paper Green Chemistry

M. L. Davies, Sustainable solvent selection for the manufac-

Green Chem. This journal is © The Royal Society of Chemistry 2023

Green Chemistry Paper

This journal is © The Royal Society of Chemistry 2023 Green Chem.

Paper Green Chemistry

Green Chem. This journal is © The Royal Society of Chemistry 2023

You might also like