Download as pdf or txt
Download as pdf or txt
You are on page 1of 38

Computer-aided molecular design: An

introduction and review of tools, applications,


and solution techniques
Nick D. Austin1, Nikolaos V. Sahinidis1, and Daniel W. Trahan2
arXiv:1701.03978v1 [cs.CE] 15 Jan 2017

1
Carnegie Mellon University, Pittsburgh, PA, USA
2
The Dow Chemical Company, Freeport, TX, USA

Abstract
This article provides an introduction to and review of the field of computer-aided molecular design (CAMD).
It is intended to be approachable for the absolute beginner as well as useful to the seasoned CAMD
practitioner. We begin by discussing various quantitative structure-property relationships (QSPRs) which
have been demonstrated to work well with CAMD problems. The methods discussed in this article are
(1) group contribution methods, (2) topological indices, and (3) signature descriptors. Next, we present
general optimization formulations for various forms of the CAMD problem. Common design constraints are
discussed and structural feasibility constraints are provided for the three types of QSPRs addressed. We then
detail useful techniques for approaching CAMD optimization problems, including decomposition methods,
heuristic approaches, and mathematical programming strategies. Finally, we discuss many applications that
have been addressed using CAMD.
Keywords: Computer-aided mixture design; Computer-aided molecular design; Integrated product and
process design; Group contribution; Topological indices; Signature descriptors

1 Introduction The process of determining new and suitable


chemicals for a certain application can be generally
The application of chemistry to manipulate the termed chemical product design [34]. Chemical
natural world has its earliest examples in metalwork- product design has long been a laborious, trial-and-
ing and pottery [134], with some pottery artifacts error procedure, limited often by a fixed amount of
reported to be as old as 20,000 years [167]. Chemical chemical, time, and financial resources. Design efforts
products have since played an important role in are often high-throughput and tend to focus on a
history and left an indelible mark on the way small class of compounds or structural analogues of
we live and work. Early history included basic known chemicals. Accordingly, the so-called design
incendiary fuels, perfumes, and soap as some of the spacethe set of unique molecular structures
first widespread uses of chemicals. The modern consideredis often quite small for these product
age has witnessed an unprecedented expansion design problems, especially considering the massive
of chemical products, including pesticides, fuels design space of all possible chemical structures.
for transportation and electricity, pharmaceuticals, It is then clear that to keep pace with the
plastics, and a broad array of industrial and consumer growing demand for new chemical products and to
products. True to these historical trends, few things adequately explore the full chemical design space,
are as pervasive as chemical products in 21st century other approaches must be considered. Fortunately,
life, and there are an ever-increasing number of the availability and efficiency of computational
new chemical applications which require specialized resources makes these design problems more tenable
compounds. than ever before. Noteworthy among computational

c
2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license
http://creativecommons.org/licenses/by-nc-nd/4.0/. The formal publication of this article
is available at http://dx.doi.org/10.1016/j.cherd.2016.10.014.
CAMD Review 2

approaches is the field of computer-aided molecular ensure practical solutions as well as the chemical
design (CAMD), which leverages the simplicity feasibility of the designed structures. In Section 4,
of semi-empirical quantitative structure-property various solution techniques for the CAMD problem
relationships (QSPRs) in conjunction with fast and are discussed, including mathematical optimization
efficient numerical optimization algorithms. strategies, decomposition methods, and heuristic
CAMD has its roots in the 1980s, although the approaches. Finally, in Section 5, we provide a
general use of computers in chemistry pre-dates this diverse though non-exhaustive review of applications
by a few decades. Stated formally, the CAMD of CAMD problems.
problem concerns designing an optimal molecular
structure(s) for a certain application. CAMD
combines molecular modeling techniques, thermody- 2 Popular types of QSPRs in CAMD
namics, and numerical optimization to design good
or optimal molecular structures, many of them often The CAMD problem attempts to choose optimal (or
completely novel. Advances in chemical modeling in simply good) molecules for some purpose from the
the last few decades have greatly benefited CAMD, space of theoretically possible chemical structures.
and practitioners are now capable of relating chemical At first glance, the CAMD problem must consider a
structures to properties at several levels of accuracy very abstract chemical design space of atoms, bonds,
(molecular mechanics, semi-empirical, ab initio). aromaticity, structural isomers, electronic effects, etc.
Though CAMD often uses semi-empirical modeling Though many of these features are certainly what
techniques for their simplicity and efficiency, new gives molecules their specific properties and chemical
approaches incorporating more accurate methods functionality, they are difficult to build into any type
are emerging. Modern combinatorial optimization of optimization scheme. This is primarily because
techniques are also essential for CAMD, enabling there is no available immediate relationship between
the optimization over staggeringly large design an arbitrary chemical structure and its performance
spaces which would otherwise be inaccessible (using or suitability regarding a specific application. In
enumeration algorithms, for example). order to rank different structures and choose an
This article is intended to provide a description optimal one, we must have some efficient way to
of popular QSPRs used in CAMD, the CAMD quantify the properties and performance of each
problem itself, and several solution approaches to structure.
its various forms. There exist several CAMD A second issue is the sheer size of the chemical
reviews [52, 126] that provide a thorough coverage search space. At the time of writing this article,
of many relevant applications and milestones of the CAS registry [27] reports over 115 million unique
CAMD and are good resources for the interested organic and inorganic structures. This number only
reader. While also providing a review of CAMD represents compounds which have been synthesized
literature, the current article first motivates the field and cataloged, and it is already far too large
from the ground up. In this way, this review is for every structure to be considered in any type
intended to be accessible to readers of all levels as it of trial-and-error design scheme. This number is
presents many essential CAMD tools and techniques also only a fraction of the theoretically possible
in detail. We also provide a discussion of various chemical space, which some estimates indicate
techniques and a review of CAMD applications, so may contain more than 1060 unique molecules for
this document will also be of interest to the more small, drug-like structures [13]. Even with very
advanced CAMD practitioner. We begin in Section 2 efficient ways to estimate the performance of a
by detailing three popular classes of QSPRs which certain structure, screening these structures using
are often used in CAMD: (1) group contribution an enumeration strategy is far beyond current
methods; (2) topological indices; and (3) signature computational capacity. For this reason, we also
descriptors. Next, in Section 3, we present the need to relate the chemical space to a space that can
CAMD problem from a mathematical programming be utilized for combinatorial optimization, allowing
perspective, discussing various classes of the single- us to design over the massive search space far more
molecule design problem as well as CAMD problems efficiently.
considering mixtures of molecules and those involving CAMD practitioners have relied on semi-empirical
the simultaneous design of a chemical product and quantitative structure property relationships
the process it is a part of. In this section, (QSPRs) to address both of these issues. First,
several other important design considerations are many semi-empirical methods delineate a clear
presented, including a few important constraints to connection between the abstract chemical space and
Austin, Sahinidis, and Trahan 3

Figure 1: Propanol represented by its groups molecules. To regress these parameters, the identity
of all of the groups must be specified a priori.
= CH3 Returning to our example, one can easily imagine
OH different sets of groups being used to describe
= CH2
propanol. For example, the groups CH3 , CH2 ,
n-propanol = OH and CH2 OH also completely account for the atoms
in propanol, and these may provide a better fit for the
regression problem. For this reason, different group
contribution methods to estimate different properties
usually do not have consistent sets of groups. Finally,
a more practical space of quantitative properties.
we note that the vector n is generally much bigger
These methods are also often simple and can be
as many group contribution methods contain 50-100
applied to estimate properties very efficiently.
groups. Many group contribution methods make the
Second, many of these methods break molecular
additional assumption that groups cannot overlap,
structure into sub-molecular collections of atoms
which means that n is typically a sparse vector. A
and bonds. These molecular sub-structures are
pictorial example of the usage of group contribution
assumed to dictate a molecules properties. Using
methods is given in Fig. 2. In this example, we
these types of representations of the molecular space,
apply a hypothetical GC method to a hypothetical
combinatorial optimization can be directly applied
molecule. We show how a molecular structure is
to molecular design problems.
decomposed into its constituent groups and provide
a count of each of these groups. These counts
2.1 Group-contribution methods constitute the elements of the vector n. The n vector
is paired with a hypothetical c vector, and an example
The most commonly used QSPRs in CAMD are property is calculated.
group contribution (GC) methods. These work One of the earliest examples of GC methods is
under the assumption that a molecules properties from Benson and Buss [10], who are considered
can be predicted by the number of occurrences of to be the originators of so-called group increment
various molecular sub-structures called groups. For theory. Group increment theory, or Benson group
example, we may think to represent the simple increment theory (BGIT), is analogous to GC
molecule propanol as a combination of the groups methods, but these terms may be more common in
CH3 , CH2 , and OH. In this case, the dashes the physical chemistry literature. In the original
() represent bonds to other groups. In its group 1958 paper [10], Benson and Buss proposed a simple
representation, propanol would no longer be thought group additivity scheme for the prediction of bond
of as the connected alcohol molecule, but rather as dissociation energies. Benson et al. [11] extended
some collection of its constituent groups. The group this work to account for a greater diversity of groups
representation of propanol is shown in Fig. 1. and to estimate heat capacities. Additional work
Being QSPRs, group contribution methods trans- from Cohen and Benson includes estimating heats
late the group representation of a molecular structure of formation with group increment theory [31]. A
into an estimate for some property P . To do this, large number of additional efforts have used Benson-
group contribution methods define a vector n that like increments to estimate the same thermophysical
represents the number of occurrences of each of the properties, a very small sample of which are [36, 76,
groups. Assuming we only have the three groups 144].
shown above, propanols n vector would look like n = Another very popular GC method was devised
[1, 2, 1], where the entries in this vector represent the by Joback and Reid [78]. This method extended
number of occurrences of the groups CH3 , CH2 , the group increment idea to model many different
and OH, respectively. Each of these groups g properties with the same set of groups. The
would also be associated with a coefficient cg which Joback and Reid model also included functional
quantifies its affect or contribution to a particular transformations for the original group increment
property P . Properties are calculated as follows: summations. These altered the group contribution
X definition to the following:
P = cg n g (1)
g
!
X
P =f cg n g (2)
The vector of coefficients c comes from regression g
of the property P over a large dataset of different
CAMD Review 4

Figure 2: Example usage of group contribution methods

Original structure Group representation Number of occurrences

O O F 1

C 2
F F

Estimating properties
H
Group n =[0, . . . , 1, . . . , 2, . . . , C 4
composition
vector 4, . . . , 1, . . . , 1, . . .]

Coefficient c =[. . . , 3.2, . . . , 2.4, . . . , O


vector 1
(example) 0.6, . . . , 1.2, . . . , 2.3, . . .]
X
P = cg n g
Property
g 1
estimate
= 3.2(1) 2.4(2) + 0.6(4) + 1.2(1) + 2.3(1)

where f represents some function of the inner product is shown below:


of the vectors c and n. These functions f appear in
many group contribution methods and are important X X X
when predicted properties are not simple linear P =f cg n g + cg n g + cg n g (3)
gF gS gT
functions of the number of groups in a structure.
Perhaps the most widely-used GC method in Group contribution methods have also often
CAMD is that of Marrero and Gani [113, 114]. incorporated interaction terms [122, 123, 124, 125,
This method, like the method of Constantinou and 99, 140]. These are ways to include additional terms
Gani [32] that predated it, provides another extension to account for the simultaneous presence of two
to the general form of GC methods in that it (same or different) groups in a particular structure.
introduces multiple levels of groups to better capture For example, in predicting toxicity, one group in a
proximity effects, meaning the effect of two or more molecule may lead to a simple metabolic pathway
groups which are close to one another in a molecular and therefore make many structures containing that
structure. As many GC methods are limited to group non-toxic. If that structure were also to have
groups of just a few atoms and bonds, many another group which is normally quite toxic, a GC
cannot differentiate between structures with different method without interaction terms may not predict
connectivity. The Marrero-Gani method, called the toxicity well. This would be because both groups
GC+ method, uses as a first-order approximation a would have an additive effect, meaning that there
normal group contribution method, where the groups would be one highly non-toxic contribution and one
belong to a set of primary groups F . An additional highly toxic contribution. As a result, the molecule
set of groups S contains slightly larger sub-structures. may be predicted to have an average toxicity when,
Finally, a set of groups T accounts for large groups in reality, the presence of the non-toxic group should
and overarching molecular structural features. Unlike outweigh the toxic group. Introducing an interaction
groups in the set F , groups in the sets S and T term for these two groups accounts for the situation of
are allowed to overlap with each other. Using the their co-occurrence. In this example, this interaction
hierarchical depiction of molecules with the GC+ term would likely remove whatever toxicity value was
method, a much clearer picture of a molecule is predicted by the toxic group. For more discussion of
provided. The general form of the GC+ estimates interaction terms in predicting toxicity, readers are
referred to [115]. Interaction terms generally take
the form
Ig,g = fI (ng , ng ) (4)
Austin, Sahinidis, and Trahan 5

where fI usually represents multiplication, but can (V, E) to define a graph G, with vertex set V and edge
also represent other functions. Group contribution set E. Using this depiction of molecular structures,
methods can also include some idea of structural various properties of that graph, referred to here
features [122, 123, 124, 125, 113, 114]. These account as topological indices, can be used as descriptors in
for larger effects, typically at the molecule scale, such QSPR models. More specifically, various topological
as aromatic ring substitution, cis/trans isomerism, indices are paired with regression coefficients and
aliphatic chain lengths, etc. These can typically be used to estimate properties in a similar way to GC
implemented as large groups but sometimes require methods.
special considerations. Table 1 provides several Topological indices (TIs) can take many forms.
properties typically used in CAMD problems and a They are defined as some function of the nodes and
few GC methods to estimate them. edges in a chemical graph, and one can easily see that
Strengths of GC methods. GC methods there are a large number of possible functions even
are useful in that they are very intuitive to use. just considering standard molecular graph properties
They represent a chemical structure in terms of like degree counts for nodes, connectivity, atomic
its functional components, very analogously to how types, etc. One of the first topological indices used
chemists compare and analyze structures. GC in chemical graph theory is the Wiener index [166].
methods are also able to easily represent a large and The Wiener index attempts to describe the total
diverse chemical space as the groups can be combined distance between all atoms in the graph, as given
in many different ways to produce a large variety of by d(v, v ), the graph theoretic distance between all
different structures. This is especially useful from pairs of vertices v and v . The Wiener index W (G)
a CAMD perspective. Finally, GC methods are is defined as:
easily translated into the mathematical formulations X
of CAMD problems as the inclusion and count of the W (G) = 1/2 d(v, v ) (5)
groups (the vector n) are easily represented in the v,v

context of mathematical optimization.


While the Wiener index describes a graph in terms of
Weaknesses of GC methods. There are a
its distances, another important consideration is how
few shortcomings of modern GC methods. One is
a graph is connected. To address this, an important
that many GC methods are unable to distinguish
class of topological indices called connectivity indices
isomers from one another. As isomers can have
(CIs) was developed. Connectivity indices are widely
very different properties, this represents a gap in
used in CAMD and have been shown to be useful
the predictive power of GC methods. We note that
in QSPR applications [44]. The first connectivity
some GC methods such as the GC+ methods are
indices were developed by Randic [142], who used
able to distinguish many isomers due to the inclusion
these indices to account for the degree of branching
of large groups. A second issue with GC methods
in alkanes and to model enthalpy of fusion and vapor
is the lack of consistency in groups used to predict
pressure. Randic defined an edge index to be:
various properties. Though this has no major effect
estimating these properties for a given structure, it 1
becomes problematic for mathematical formulations CIE (v, v ) = (6)
v v
of the CAMD problem. Finally, GC methods require
specifying the set of groups prior to regressing the where v and v are two connected vertices in the
GC coefficients. Though many GC methods are quite chemical graph. This means that the atoms which
accurate, there is no guarantee that the set of groups correspond to v and v are connected by a chemical
used best captures the property they model. Using bond. Furthermore, v and v are the degrees of
different groups can sometimes drastically alter the nodes v and v . In the study of Randic, these
predictive power of a GC model. degrees signified the number of bonds a particular
atom had to non-Hydrogen atoms (i.e., the number
atomic neighbors in the hydrogen-suppressed graph),
2.2 Topological indices
but they are sometimes defined differently for other
Chemical graph theory [15] is a field which became connectivity indices. The connectivity index of the
very influential in the 1970s and has since been used entire molecule (graph) was then given by
to produce a large number of QSPRs. The basic
1
X X 1
idea of chemical graph theory is that the atoms and = CIE (v, v ) = (7)
bonds which constitute a molecule can be thought of {v,v }E {v,v }E
v v
as nodes and edges in a graph. In general, we use G =
CAMD Review 6

Table 1: Sample of available GC methods for predicting various properties of pure compounds

Property GC methods
Aqueous solubility Marrero et al. [114], Klopman [100]
Joback and Reid [78], Stein et al. [157], Nannoolal et al. [122],
Boiling point
Marrero et al. [113]
Bond dissociation energy Benson et al. [10]
Jalowka et al. [76], Joback and Reid [78], Klincewicz et al. [99],
Critical pressure
Nannoolal et al. [123], Marrero et al. [113]
Jalowka et al. [76], Joback and Reid [78], Klincewicz et al. [99],
Critical temperature
Nannoolal et al. [123], Marrero et al. [113]
Critical volume Klincewicz et al. [99], Nannoolal et al. [123], Marrero et al. [113]
Cohen et al. [31], Benson [9], Domalski et al. [36], Roganov et
Enthalpy of formation
al. [144], Joback and Reid [78], Marrero et al. [113]
Enthalpy of fusion Joback and Reid [78], Marrero et al. [113]
Roganov et al. [144], Joback and Reid [78], Marrero et al. [113],
Enthalpy of vaporization
Ceriani et al. [24]
LC50 Martin et al. [115] (fathead minnow)
Melting point Joback and Reid [78], Marrero et al. [113]
Cohen et al. [31], Benson [9], Domalski et al. [36], Roganov et
Gibbs energy of formation
al. [144], Joback and Reid [78], Marrero et al. [113]
Benson et al. [11], Benson [9], Domalski et al. [36], Joback and
Heat capacity
Reid [78], Kolska et al. [103], Ceriani et al. [24]
Octanol/water partition
Marrero et al. [114], Platts et al. [140], Klopman et al. [101]
coefficient
Vapor pressure Nannoolal et al. [124]
Joback and Reid [78], Sastri et al. [149], Ceriani et al. [23], Cao
Viscosity
et al. [22], Nannoolal et al. [125]

where E is the edge set of the graph. A calculation Kier and Hall [87] developed additional connectivity
for the edge connectivity indices for a simple alkane indices to account for heteroatoms. These modified
is shown in Fig. 3. connectivity indices distinguished atoms by their
1
is the so-called first-order Randic connectivity valence electrons, leading to a new value for vertex
index. An even simpler connectivity index exists degree, V , where the superscript indicates that
which does not account for bonding at all. This valence is considered. For second period elements,
is called the zeroth-order connectivity index, and is Kier and Hall [87] define vV for a vertex v as
given below:
0
X 1 vV = ZvV hv (10)
= (8)
v
vV
where ZvV indicates the number of valence electrons
Kier et al. [89], Hall et al. [65], and Murray et for atom/vertex v and hv is the number of hydrogens
al. [121] were the first to apply these connectivity attached to v. For atoms in the third period and
indices as descriptors in QSPR models. These models beyond, the main difference from this perspective is
are typically linear in the descriptors, but many the number of core electrons. These are accounted
variations exist. An important step forward came for in the following:
from Kier et al. [90] who introduced higher-order
connectivity indices. In general, these are defined for ZvV hv
vV = (11)
an index of order i as: Zv ZvV 1
1
(9) where Zv isV the atomic number of atom v. These
X
i
= qQ
i
v modified values define analogous connectivity
{v1 ,v2 ,...}VC v indices i V .
The connectivity index can be thought to
where VCi is the set of all sets of i connected vertices.
Austin, Sahinidis, and Trahan 7

Figure 3: Randic edge indices for a simple alkane

Structure Calculating CIE (v, v )

3 Atom Number (v) 1 2 3 4 5 6 7


e1 v 1 3 1 4 1 1 1
2
7 CIE for two edges
4
e2 1
CIE (e1) = CIE (1, 2) = = 0.577
1 13
6 1
CIE (e2) = CIE (4, 6) = = 0.500
5 41

primarily capture vertex adjacency and the local applications of QSPR development with topological
neighborhoods of every atom in a molecule. As such, indices. We note that this table is inclined towards
it represents how atoms are connected and perhaps properties common in CAMD problems, so the
not the ensemble molecular structure and shape. properties listed may be more relevant to chemical
Another topological index often used in CAMD that engineering applications. However, we provide a
aims to address this is the so-called shape index, . few examples of the many applications of topological
The shape index accounts for features of the entire indices to modeling biological, environmental, and
molecular structure as functions of the underlying pharamacological properties. There have been many
graph architecture and counts of graph substructures. efforts to build models related to pharmaceutical
The main parameter in the calculation of is the properties [7, 45], so only a small subset is listed here.
number of paths in the chemical graph of a certain One additional interesting application of connectivity
length. We define i P to be the number of paths of indices is in the prediction of coefficients for group
length i in a particular chemical graph. contribution methods where the group is missing (the
Like other topological indices, the shape index group is not in the descriptor space) [56].
maps this information onto a single value. There are Strengths of TIs. One of the main advantages
three often used i values which are defined as: of TIs is that they can discriminate between very
similar structures, often in cases where GC methods
1 2(1 Pmax )(1 Pmin ) cannot (e.g., isomers). This provides a more holistic
= (12)
(1 P )2 picture of the molecule and can be very useful for
2 2(2 Pmax )(2 Pmin ) certain design problems. For example, this may have
= (13)
(2 P )2 potential applications for CAMD in areas where a
4(3 Pmax )(3 Pmin ) structural feature of the to-be-designed compound is
3
= (14) fixed a priori. Furthermore, since many TIs are a
(3 P )2
function of the entire graph, TIs reflect the entire
with i Pmax and i Pmin representing the maximum and nature of the molecular structure. This can have
minimum possible number of paths of length i for a advantages over GC methods, which assume that
hypothetical molecule with an equivalent number of each group provides a contribution independently
atoms. i Pmax and i Pmin can be easily derived from of other groups in the structure (this is offset
graph theoretic arguments. The formal expressions somewhat by GC interaction functions). Finally,
for these can be found in [63]. Several i values for TIs have been extensively applied to modeling
an example molecule are calculated in Fig. 4. pharmacological properties. The quality and volume
In Table 2, we list a few references for QSPR of this literature means that TIs are very suitable to
models using topological indices. There are many, many pharmaceutically relevant CAMD problems.
many examples of such models, and we note that Weaknesses of TIs. Though topological indices
this table is not meant to be exhaustive. For a have been widely applied to QSPR, there are only
more complete list of available topological-indices- limited examples in CAMD. Topological indices are
based QSPRs, the reader is directed to the book of usually not as generally applicable as GC methods,
Devillers and Balaban [35]. meaning that TI-based QSPRs are often restricted
Table 2 provides some idea of the diversity of to a certain class of chemicals. For design purposes,
CAMD Review 8

Figure 4: Getting i P and calculating i for an example molecule

Number of atoms Paths of length 1 Paths of length 2 Paths of length 3

0 1 2 3
P: 6 P: 5 P: 6 P: 4
1 1 2 2
1 2( Pmax )( Pmin ) 2 2( Pmax )( Pmin ) 3 4( Pmax )(3 Pmin )
3

Calculating = = =
(1 P )2 (2 P )2 (3 P )2
values 2(30)(5) 2(10)(4) 4(4)(3)
= = 12 = = 2.22 = =3
52 62 42
0
Other structures P 1P 2P 3
P Other structures 0
P 1P 2P 3P

6 6 6 6 6 5 7 3

6 5 4 3 6 7 11 13

this is problematic as it means that TI-based design atom in a molecule.


problems can only consider that particular subset Analogously to TIs, SDs define the chemical graph
of the chemical search space. Furthermore, TIs to be G = (V, E). Standard SD methods also
represent graph-theoretic properties of the chemical incorporate node coloring for every node v via a
graph, and many of these properties are not always coloring function cG (v) and the colors of each node
readily understandable from a chemical perspective Cv . This change is reflected in the slightly altered
(although Randic and Zupan [143] have offered some definition of the chemical graph as G = (V, E, C, cG ).
interpretations of several TIs). Finally, TIs are These node colorings are intended to distinguish
more difficult to incorporate into CAMD than GC between different atoms as well as different types of
methods. We will discuss TI-based CAMD in a the same atom. For example, it may be beneficial for
subsequent section, but such CAMD problems can the model to differentiate between an oxygen with
sometimes face combinatorial difficulties and have two single bonds and an oxygen with a double bond.
only been demonstrated thus far on small design Additionally, one may want to distinguish hydrogens
problems. by what atom they are attached to, aromatic carbons
from non-aromatic ones, atoms attached to aromatic
rings, and many other chemical features. We note
2.3 Signature descriptors
that the colorings of the nodes are one of the
In a broad sense, GC methods capture the important only subjective parts of SD models, and different
subsets of atoms in a molecule while TIs rely on some coloring schemes can have significant effects on the
function of the chemical graph. One QSPR method performance of the models.
which has been shown to capture aspects of both One important class of signature descriptors is
GC- and TI-based methods is signature descriptors known as atomic signatures. Given a certain atom
(SD). Signature descriptors are far younger than the in a chemical graph, its atomic signature represents
other methods discussed, originating in 2002 from all of the atoms that are within a certain distance, or
Visco et al. [163] and 2003 from Faulon et al. [47]. height, from it. Varying the value of this distance
Like TIs, they conceive of chemical structures as the gives rise to different atomic signatures. In the
chemical graph. Rather than ascribe various values simplest case, with this distance set to 0, the atomic
to a complete molecular graph, SDs retain all of signature of an atom is simply that atom, colored in
the structural and connectivity information for every keeping with the coloring definition. More formally,
for an atom (vertex) v, its atomic signature of height
Austin, Sahinidis, and Trahan 9

Table 2: Sample of available TI-based methods for predicting various properties of pure compounds

Property TI method
Anti-inflammatory activity Gupta et al. [61], Bajaj [5]
Aqueous diffusion coefficient Schramke [151]
Aqueous solubility Hall et al. [65], Katritzky et al. [85]
Biodegradability Boethling [12]
Blood-brain barrier partition
Rose et al. [145]
coefficients
Boiling point Hall et al. [65], Hosoya [73], Hall et al. [64], Galvez et al. [51]
Critical temperature Hall et al. [64]
Density Kier et al. [90], Estrada [43], Katritzky et al. [84]
Enthalpy of formation Mercader [118]
Enthalpy of fusion Gharagheizi et al. [59]
Enthalpy of vaporization Galvez et al. [51]
Flash point Patel et al. [135]
Heat capacity Yao et al. [169]
LC50 (fathead minnow) Hall et al. [66, 67], Basak et al. [8]
Melting point Katritzky et al. [84]
Nonspecific local anesthetic
Kier et al. [89], Katritzky et al. [84]
activity
Octanol/water partition
Murray et al. [121]
coefficient
-electron energy (C-C bonds) Hosoya et al. [74]
Refractive index Katritzky et al. [84]
Vapor pressure Katritzky et al. [85]
Viscosity Kauffman et al. [86]
Water-air partition coefficient Katritzky et al. [85]

0, 0 is given by: increasing the i value is equivalent to adding a layer


to a breadth-first-search. A pictorial explanation
0
(v) = cG (v) (15) is provided in Fig. 5. In this example, we assume
that all atoms are colored by aromaticity and sp
Of course, these atomic signatures of height 0 are hybridization and that hydrogens are colored by the
not a robust set of descriptors. Considering higher atom they are attached to.
values for height provides a more detailed picture of The atomic signatures can be thought of as
an atoms environment. In general, for a height of i, a descriptor space for a molecular structure. A
the atomic signature of height i 1 for a vertex v is property, P , of a molecule can be estimated by all
defined as: of its atomic signatures of up to a particular height.
i This QSPR has a familiar form:
(v) = GS (V i , E i , C i , cG ) (16)
X X
where GS defines a subgraph of G that contains all P = cd i G (d) (17)
i dDi
vertices and bonds such that the distance between
v and any vertex in GS is at most i. Thus, an where d is the index of the set of all atomic
atomic signature of height one for a hypothetical signatures and Di is the set of atomic SDs of
atom v defines v and every atom bonded to v as height i. cd is a regression coefficient accounting
well as all connecting edges. A height two atomic for the contribution of each atomic signature to
signature defines v, every atom bonded to v, and a certain property. i G (d) represents the number of
every atom bonded to those atoms bonded to v along occurrences of atomic signature d. Using the atomic
with all necessary edges. Increasing the height of signatures, signatures of the entire molecule can also
an atomic signature can thus be thought to add a be generated.
layer of connected atoms. In graph-theoretic terms, A major advantage of signature descriptors is that
CAMD Review 10

they can be manipulated via simple functions to estimate properties from a structure. The application
represent groups from GC methods as well as various of QSPR techniques in this directionpredicting
TIs. This means that the large amount of QSPRs properties from structuresdefines what is known as
derived from GC methods and TI-based methods are the forward problem, and is what QSPR techniques
accessible using SDs. For this reason, we do not are generally intended for. CAMD can broadly be
provide a table of QSPRs using SDs because they thought to consider the reverse problem, or the
can beand often areused to calculate properties problem of predicting structures from properties. At
via GC- and TI-based methods. A few examples of first glance, there is no immediately obvious way of
converting SDs to groups and TIs are given in [47]. relating properties to a specific molecular structure.
Strengths of SDs. One of the main advantages One issue is that there are so many structures to
of signature descriptors is that they have a small consider. A reasonable approach to the CAMD
inherent bias as compared to GC methods and problem should be able to consider a large diversity
TIs. The only bias introduced in these descriptors of structures without running into significant
is the choice of an atomic coloring scheme, or computational difficulties. Another issue involves
the choice of what defines a different type of the structural feasibility. Solutions to the CAMD
same atom. Furthermore, the inclusion of every problem must also be sensible molecular structures,
atomic signature in SD-based QSPRs means there meaning CAMD should produce structures that do
are no theoretical restrictions on the descriptor not violate any inherent laws of chemical bonding.
space defined for a molecular structure. There are A final issue involves consistency. All of the QSPRs
also a variety of more modern signature descriptors discussed (though non-overlapping GC methods are
capable of distinguishing stereoisomers. In the an exception) require that if certain features are
case of stereoisomers, certain SDs have far greater present, so must be other features. For example,
discriminative power than GC methods or TIs, it would be unreasonable to design a structure
which typically cannot differentiate stereoisomers. that has four paths of length three but no paths
Furthermore, the equivalence of SDs to many TIs and of length two. In the case of GC methods or
groups for GC methods means that SDs can be used SDs, an example erroneous solution would contain
directly with these TI and GC QSPR models. This one occurrence of the carboxylic acid (-C(=O)OH)
makes a large library of QSPR models accessible to group/signature and no occurrences of the carbonyl
SDs. (-C(=O)-) group/signature. Note that this example
Weaknesses of SDs. Many QSPRs using SDs assumes that overlap is allowed between these
use all of the available atomic signatures of up to groups/signatures (although this is generally the case
a certain height. This can quickly become an issue with signatures).
with the predictive power of SDs as models without Mathematical optimization is the key to address-
sufficient training data may be overfit. A second ing all of these issues. Before introducing the
concern with SDs is that the coloring scheme of the problem in a general optimization formulation, we
descriptors must be specified before the descriptors define a few important sets and variables. First,
are used for QSPRs and CAMD problems. It is we assume that we have a vector of properties p
likely that some coloring schemes provide better and a property value pk for each property k. The
results than others, and the best coloring scheme may vector n encapsulates relevant structural information
not always be easy to determine. Finally, atomic of the designed molecules and is dependent on
signature descriptors always discriminate between the type of QSPR chosen. In the case of GC
identical atoms in different structural environments. methods, the value nd would represent the number
This may again lead to issues of overfitting or not of occurrences of each group d. For TIs, the n vector
capturing the true chemical behavior of the system may represent the number of topological features d
as sometimes it is better to model identical atoms (edges, paths of certain lengths, etc.) from which
with a general descriptor which is independent of the TIs would be calculated. For SDs, this n vector
atoms environment (although this can be captured usually represents counts of various atomic signature
to some degree with SDs of lower heights). descriptors d. The function f then transforms this
structural information into a property estimate using
the appropriate QSPR relationship. In general, the
3 CAMD as an optimization problem
The various types of structure-property relationships
discussed above can quickly and often accurately
Austin, Sahinidis, and Trahan 11

Figure 5: Atomic signature descriptors for a carbon atom in an example molecule

Molecular Atomic signature


Tree representation
representation descriptors

O
Height 0 descriptor
HN C
sp2 C2
O N O
H

O
Height 1 descriptor
HN
N O C C2 (N3 )(O2 )(C3 )
O N O sp3 sp2 sp3
H

O Height 2 descriptor
HN C2 (N3 (C2 )(HA ))
(O2 )
O N O C HA C aC C (C (C )(aC)(C3 ))
3 2

sp2 sp2 sp3


H

CAMD problem can then be expressed as: and h2 can also account for design considerations
such as system thermodynamics, cost, and a
min C(n, p) (18) variety of system-specific interactions between n and
n
p. (22) and (23) define inequality and equality
s.t. p = f (n) (19)
constraints which ensure structural feasibility. More
h1 (p, n) 0 (20) specifically, the functions s1 and s2 determine
h2 (p, n) = 0 (21) if the vector n is consistent with a molecular
s1 (n) 0 (22) structure which can actually exist. These constraints
prevent erroneous structures from being considered,
s2 (n) = 0 (23)
eliminating compounds that violate atomic valences,
pL U
k pk pk k (24) are disjoint, have an incorrect number of aromatic
nL
d nd nU
d d (25) atoms, etc. 24 and (25) set bounds on property values
and n, respectively. Each property k is bounded
In the above, (19) involves QSPR functions f which below by pL U L
k and above by pk . Similarly, nk and nk
U

estimate a vector of properties p from attributes define lower and upper bounds for nk . Finally, (18) is
such as group counts, graph topological features, a general objective function for the CAMD problem.
or atomic signatures. (20) (21) define general C(n, p) can define a number of possible functions.
functions, h1 and h2 , representing inequality and These functions somehow quantify the performance
equality constraints on property values, desired of a specific molecule based on its properties p and
structural features, process conditions, and a variety perhaps its descriptors n. Of course, this C function
of other possibilities. For example, the presence of can either be minimized or maximized depending on
certain groups or structural features may necessitate the design problem.
changing a chemical process to accommodate these
structures. These constraints can also eliminate
certain groups/topological features/signatures from
the solution space or require that they appear
a certain number of times. The functions h1
CAMD Review 12

3.1 Classes of the CAMD problem ranked, so many feasibility problems actually have
an implicit objective function. A second complication
3.1.1 Single molecule design is that many CAMD methodologies are designed to
solve both feasibility problems and problems with
The single molecule design problem is the problem
objectives. We provide a few examples of feasibility
of determining a single, optimal structure for a
problems here for GC methods [77], TIs [88, 91], and
particular purpose. For these problems, the structure
SDs [29]. We leave a more detailed discussion for
of the compound is the only design consideration,
the applications section of this article. A graphical
meaning that the variables represented by n above
example of the feasibility problem is given in Fig. 6.
are the only degrees of freedom in the problem.
Exact relationship between structures and perfor-
Furthermore, it is assumed that there exists some
mance. In many cases, it is possible to model a
ranking criterion with which to determine which
molecules performance directly as a function of its
structures are better than others. To make
structure and properties. This is the most natural
this section as general as possible, we assume
and common form of the CAMD objective because
that the ranking criterion can either be applied
the performance function used to rank molecular
during the optimization procedure as the objective
structures is what is minimized or maximized. Using
function or afterwards, evaluating the performance
the exact performance function as the objective
of each of a pool of candidate molecules. Though
C(n, p) guarantees that the solution to the CAMD
many classifications of these single molecule design
problem is the optimal molecule for the application,
problems are possible, we suggest three basic forms:
at least as judged by the provided performance
(1) determining all feasible structures; (2) using
function. We also note that if the function C(n, p)
an objective function which directly quantifies a
or any of the constraints is non-convex, the optimal
molecules performance; and (3) designing a structure
molecule may only represent a local optimum. Using
with properties as close as possible to certain
a global optimization algorithm will provide the best
property targets.
molecule for these non-convex problems.
The feasibility problem. We begin by describing
An exact representation of the objective has
the case where there is no objective function used
inspired many numerical optimization strategies to
in the optimization problem. More formally, this is
solving the CAMD problem. Numerical optimization
equivalent to solving the above CAMD formulation
strategies are particularly advantageous in these
with the objective function equal to a constant.
cases because there is an exact algebraic relationship
The solutions to this problem represent all molecules
between the descriptor space n and the performance
which satisfy the functions h1 and h2 , are chemically
function C. This enables the use of modern, state-of-
sound structures, and do not violate the property
the-art optimization techniques. Several of these will
or descriptor constraints. This is a useful type of
be discussed in an upcoming section.
problem to solve in CAMD when the performance
There are many examples in the CAMD literature
function C is not accessible or reliable. It may
which fall into this category. Again, we will postpone
be the case that C requires a complex simulation
the discussion of these topics until later sections
or experimental work. The performance function
of this document. We provide a small (and non-
may also be inaccurate or the designer may not
exhaustive) selection of relevant references in the
know exactly which properties to optimize. In these
meanwhile to offer some insight to the interested
cases, these CAMD problems leverage the power of
reader [127, 154, 146](GC), [20, 152] (TIs), and [28]
optimization to reduce the large number of possible
(SDs). A graphical example of this type of CAMD
compounds to a more manageable number. This
problem is provided in Fig. 7.
smaller pool of compounds can then be investigated
Minimizing distance to property targets. Another
using high-order models or experiments.
class of single molecule design problems concerns
Feasibility problems are sometimes difficult to
finding a molecule with properties as close as possible
distinguish from other types of problems. In many
to target values. These types of formulations
cases, a feasibility problem is solved at one of the
have been used extensively in CAMD, often to
beginning stages of the problem, and all of the
design alternatives to molecules being used in
feasible structures are then evaluated based on some
practice. These molecules may need to be substituted
ranking criteria. For these types of problems, there
for reasons of environmental friendliness, cost, or
is no clear line between a feasibility problem and
availability. Many other types of problems can be
a problem with an objective function. Usually, the
addressed when the ideal properties of a molecule
molecules designed by these problems are ultimately
are known. However, if these ideal properties do not
Austin, Sahinidis, and Trahan 13

Figure 6: Pictorial representation of the problem of finding all feasible molecules

Feasible property space Only certain points correspond to Feasible points can be used to make structures
feasible molecules
O HO OH
No feasible
OH
molecule with
these properties
Property 2

Property 2
O

Cl
O

O
N

Property 1 Property 1

Figure 7: Pictorial representation of the problem of finding an optimal molecule

We have some objective Finding the optimal point We obtain the best molecule or a
function of properties ranking of the top solutions
O

Optimum
Objective
contours
Property 2

Property 2

F
HO F
Rank: 2

OH
Rank: 3 F

Property 1 Property 1

represent a structure that is physically realistic, these


approaches face some difficulty.
In general, these types of problems define the
objective to be: Figure 8: Pictorial representation of the
X problem of finding a feasible molecule with
C(p) = wk (pTk pk )2 (26) predicted properties as close as possible to target
k values
where pTk represents a targets for property k and wk
Finding the point closest to a
is a weight for the distance from from the estimated target point
property value to its corresponding property target. Closest molecule
The second norm shown above is the most common Target O
properties
distance function in CAMD, but many other distance
O O
functions are possible.
Property 2

We present a few demonstrative examples of


CAMD with property target objective functions. Next closest
Matsuda et al. [116] used groups to design ionic O
liquids based on conductivity and viscosity targets. H O
Siddhaye et al. [153] addressed this problem
with topological indices to design pharmaceutical Property 1
products. Brown et al. [17] used signature descriptors
to design polymers with specific properties. A
graphical example of this type of CAMD problem is
CAMD Review 14

given in Fig. 8. relationships which relate individual component


property variables p, descriptor variables n, and mole
fractions x to the mixture property variables q. In
3.1.2 Mixture design
many cases, some of these q values represent activity
Real-world applications often demand a product coefficients. Eq. (29) ensures that mole fractions sum
U L
with specifically tailored properties. This sometimes to 1. Finally, (32) places upper (qj ) and lower (qj )
necessitates utilizing a mixture of compounds bounds on mixture properties. We provide a few
as no single compound possesses all of the mixture design citations [54, 33, 19] and leave the
necessary properties. Using CAMD techniques rest for the applications section.
to simultaneously design two or more compounds
for use in a blend/mixture is referred to as the 3.1.3 Integrated process and product design
mixture design problem. We note here that we
alter a definition given in [4] for consistency with Though many CAMD endeavors design products
the prevailing definition of mixture design in the with the ultimate goal of being incorporated into
literature. All applications classified under mixture an industrial process, few have explicitly considered
design can be assumed to design two or more the relationship between a particular structure
structures simultaneously. and a process. This is especially important as
While the single-molecule design is difficult in process performance is typically very sensitive to the
many cases, the mixture design problem is much molecule(s) chosen. These problems are especially
harder. The difficulties come from several sources: challenging from an optimization point of view
(1) the descriptor variables must now represent because there is no easily discernible algebraic
the descriptors of every unknown component in relationship between the descriptor variables n and
the mixture; (2) mixture properties must be process variables. This problem also requires the
calculated and included in the problem; (3) non-ideal introduction of process variables w , with w defining
mixture behavior must be considered in the form the index over the process variables. To present
of thermodynamic relationships, activity coefficient the formulation for integrated process and product
models, or equations of state; (4) the design of design, we modify the single-molecule design problem
mixtures also requires a determination of the amount for simplicity.
of each component, so mole fractions must be First, the objective function now reflects process
considered. We provide a pictorial representation variables:
of the mixture design problem in Fig. 9. In this
problem, we define the variable xi to represent the min C(n, p, ) (33)
n,
mole fraction of unknown component i. Furthermore,
qj will represent the mixture property j. The A few additional constraints then account for the
formulation of the mixture design problem is similar inclusion of process variables:
to the single-molecule design problem. The objective
function is altered to now include mixture properties: h1 (p, , n) 0 (34)
h2 (p, , n) = 0 (35)
min C(n, p, q) (27) L U
n,x w w w (36)
A few additional constraints are also necessary: In the above, (34) and (35) represent modified
constraints from the previous formulation to include
q = g(x, n, p) (28)
X the process variables . (36) is introduced to
xi = 1 (29) place upper (U w ) and lower ( L
w ) bounds on process
i variables. There is a similar formulation defined for
h1 (p, q, n) 0 (30) process/product design problems which also consider
h2 (p, q, n) = 0 (31) mixtures. Again, we provide a small selection of
references [41, 129, 82] and leave the remaining
qjL qj qjU (32) discussion for the applications section. One approach
to the process design problem is illustrated in Fig. 10.
In the above, (30) and (31) represent modified
constraints from the previous formulation to include
the mixture property variables q. Eq.(28) represents
simple mixing functions or complex thermodynamic
Austin, Sahinidis, and Trahan 15

Figure 9: Pictorial representation of the mixture design problem

Mixture properties usually Determining optimal mole


Feasible points represent mixtures
require activity coefficients fractions
Molecular Mole fraction
descriptor OH = . variables

Objective value
variables = .
Property 2

OH
= .
F = . Optimal
F mole
F fraction

Property 1 (mole fraction)

Figure 10: A decomposition approach to the integrated product/process design problem

Optimize process, keeping


Find molecule(s) as close as Re-optimize process, using the
molecular properties as
possible to ideal property point properties of the closest molecule
continuous variables

Ideal
objective
Property 2

Property 2

Property 2
O O
Actual
HN O
objective

Property 1 Property 1 Property 1

3.2 Common design features in the form of where N U represents the maximum number of
constraints structural features (groups, atoms, descriptors,
topological features, etc.) in the designed molecule.
These constraints define any process criteria, design Many design problems also set a lower bound on
necessities, thermodynamic conditions, and any other this summation, N L , which is typically equal to 2.
features which may be important for a particular This ensures that more than one descriptor appear
design problem. As these constraints reflect the in the solution. Finally, some design problems seek
diversity of applications of CAMD, we cannot go into solutions which are analogues within a given family of
detail about all of them here. Rather, we present chemical structures. In this case, it may be necessary
a summary of design features and conditions which to include the descriptor variables corresponding to
occur most commonly in CAMD. this structural family:
A very common design feature in CAMD limits the
number of groups which can occur in the solution. nF
d nd d DFIX (39)
For example, constraints of this variety can place a
limit on the number of molecular descriptors d: where DFIX defines the set of descriptors which must
occur in the solution and nFd represents the number
nd nU
d (37) of descriptors d which must occur for the structural
family to be produced. This constraint is typically
or a limit on the total number of descriptors: part of (25), where other descriptors may have lower
X bounds for other reasons.
nd N U (38) Properties are also bounded similarly in equa-
d
tion (24). These constraints ensure that no pro-
cess conditions, environmental regulations, toxicity
CAMD Review 16

thresholds, etc. are violated with the designed 3.3.1 Connectivity constraints for GC
structures. methods
Mixture design problems or single component
design problems involving mixtures typically have If using GC methods to solve CAMD problems,
a number of common constraints. Most notably, the variable nd typically represents the number of
these problems often require some idea of the occurrences of the group d in the solution. Many
structural constraints in CAMD also consider the
activities of the chemical species in the mixture and
valence of each group, d , which is simply the number
thus necessitate incorporating an activity coefficient
model. There are three such models often used in of bonds a group requires to satisfy its valence
CAMD: electron requirements. For example, the group
CH1 (Cl) requires 2 external bonds (CH1 (Cl) = 2)
1. UNIFAC. UNIFAC [50] is a group contribution as two of carbons four required bonds are accounted
variant of the UNIQUAC equation [2] and has for by hydrogen and chlorine atoms. One of the
been used extensively in CAMD. UNIFAC is most widely-used rules for structural feasibility using
simple, accurate, and easy to incorporate into valences comes from Odele and Macchietto [127]:
CAMD problems due to its use of groups. The
original UNIFAC may lack accuracy outside of
X
(d 2)nd = 2m (40)
standard temperature and pressure ranges, but d
many extensions and re-parameterizations exist
to address these cases. where m is defined by:
2. SAFT. SAFT [25] is an accurate equation of

1, if compound is acyclic

state that is applicable in many temperature
m = 0, if compound is monocyclic (41)
and pressure domains. It is gaining popularity
1, if compound is bicyclic

in CAMD and several group contribution
methods [159, 108, 109, 137, 133] have
These constraints alone, while appropriate for most
already been developed to estimate the SAFT
molecules, still allow for certain sets of groups which
parameters necessary for its use.
cannot be joined to form feasible molecules. For
3. COSMO-RS and -SAC COSMO-RS [95, example, a solution of 1 > CH0 < (CH0 = 4)
97] and COSMO-SAC [107] are two post- group and 2 -Br (Br = 1) groups is feasible for
processing methods for the COSMO solvation the above constraints with m = 0. To address
model [96]. Unlike other approaches, these these situations, Odele and Macchietto also define a
COSMO-based models use electronic surface constraint to ensure that there are enough groups to
charge distributions that are calculated at the meet the valence requirements of every group. This
quantum chemistry level. These methods are means that to include a group in the solution with a
now more amenable to CAMD due to the valence requirement of 4, there must also be at least
development of various COSMO-based group 4 additional groups. This is captured in
contribution methods [119, 120, 3]. X
nd nd (d 1) + 2 d (42)
d
3.3 Forms of the structural feasibility

constraints where d is also an index over descriptors. These
constraints can be generalized with integer variables
The constraints defined above in (22) and (23) N arom and N ali , which represent the number of
R R
ensure that the structural descriptors chosen with the aromatic rings and the number of aliphatic rings,
variables nd are consistent with all other structural respectively. Furthermore, we introduce G
ali and
features such that they can be assembled to create G to be the sets of groups which have any
arom
a chemically feasible molecule. The three different open valences which are aliphatic and aromatic,
types of QSPRs given above require different sets of respectively. We define these valence counts to be
these constraints, so this discussion will be divided ali for aliphatics and arom for aromatics. Note
into three parts, beginning with group contribution that the same group can appear in both sets. We
methods. also define the parameter Aarom to represent the
d
number of aromatic atoms in a group d and the
parameter d to represent the number of available
aliphatic attachment points for every aromatic atom
Austin, Sahinidis, and Trahan 17

Figure 11: An example of connectivity constraints for GC methods, considering an acyclic,


aliphatic molecule

Feasibility check
H Valence
P balance constraints
Groups C CH3 Br
N (ali
d 2)n d =2
H d
?
Valence (ali = 2 2(NRali ) = 2 X
d ) 2 3 1 1
nd 1 1 2 1
nd (2 ali Sufficient number of groups
d ) 0 -1 2 1
for satisfying valence
ali=2 P
H nd = 5
2-ali=0 ali=1 d
ali=1 N CH3 ?
2-ali=-1 nNH1 (NH1 1) + 2 = 3 X
Example 2-ali=-1
?
molecule H3C C nCH1 (CH1 1) + 2 = 4 X
ali=3
?
2-ali=1 H ali=1 nCH3 (CH3 1) + 2 = 2 X
Br 2-ali=-1 ?
nBr (Br 1) + 2 = 2 X

in group d. Below we provide a slightly more general 3.3.2 Connectivity constraints for TI
formulation considering aromatic rings: methods
X
(2 ali
d )nd = 2 2NR
ali Constraints for TI-based CAMD work in a similar
dGali fashion to constraints for GC-based methods. Many
X CAMD approaches using TIs do not account for
+2 d nd 2NRarom (43)
topological features explicitly, meaning that there
dGarom
X are usually no variables signifying the number of
(2 aromd )nd = 0 (44) paths, Randic indices, etc. These values are instead
dGarom calculated as a function of the molecular graph, which
(45) is typically represented in the form of an adjacency
X
arom arom
Ad = 6NR
dGarom
matrix A. One of the most general adjacency
X

matrices for TI applications is defined by Camarda
nd nd (ali d 1) + 2 d (46) and Maranas [20]. The matrix they defined is 3-
dGali
X dimensional with dimensions corresponding to all of

nd nd (arom
d 1) + 2 d (47) the nodes in the molecular graph on two axes and
dGarom all the possible bond types on another axis. The
possible bond types given by Camarda and Maranas
We note that in this example, the aromatic rings are are b = {1, 2, 3} for single, double, and triple bonds,
assumed to be benzylic and not attached to another respectively. An entry, av,v ,b , of the adjacency
aromatic ring (biphenyls and fused aromatics). An matrix is defined by:
example of these basic structural constraints for
GC methods are given in Fig. 11. A few simple if vertex v is connected to vertex v
extensions allow these constraints to account for all a = 1, with a bond of type b

v,v ,b
aromatic rings. These modified Odele-Macchietto
0, otherwise

constraints work in cases where all groups are bonded
to each other with only single or aromatic bonds. (48)
More complex connectivity constraints, accounting
Furthermore, there are assumed to be L different
for cases where groups may be double or triple
types of nodes, where each of these nodes l has a
bonded to one another are given in [146].
valence value l . Camarda and Maranas appeal to the
formulation of Raman and Maranas [141] and define
CAMD Review 18

Figure 12: An example of connectivity constraints for TI-based CAMD

Molecule av,v ,b=1 av,v ,b=2


v1 v2 v3 v4 v5 v6 v1 v2 v3 v4 v5 v6

v1 0 1 0 0 0 0 v1 0 0 0 0 0 0
1
5 v2 1 0 0 0 0 0 v2 0 0 0 1 0 0
O

2 v3
0 0 0 1 0 0
v3
0 0 0 0 0 0

4 6 v4
0 0 1 0 1 0
v4
0 1 0 0 0 0

v5 0 0 0 1 0 1 v5 0 0 0 0 0 0
3 v6 0 0 0 0 1 0 v6 0 0 0 0 0 0

Valence constraint for v4


Valence requirements Single bonds Double bonds
v1
P PV v1
P V
P
l=(=CH0 <),b=1 = 2 av ,4,1 + a4,v ,1 = 1 + 1 av ,4,2 + a4,v ,2 = 1 + 0
v =1 v =v+1 v =1 v =v+1
l=(=CH0 <),b=2 = 1 ? ?
= ((=CH0 <),1 )(y4,(=CH0 <) ) = 2 X = ((=CH0 <),2 )(y4,(=CH0 <) ) = 1 X
Connected graph constraint for v4
Vertex in solution? Sum of connections to lower-indexed vertices
41 3 3
P
y4,l6=dummy = 1 PP P P
l av,v ,b = av,v ,b=1 + av,v ,b=2 = 0+0+1 + 0+1+0 = 2
vertex appears in v=1 b v=1 v=1
?
the solution (non-dummy) 2 1X

a binary variable yv,l to be only be assigned to one type:


( X
1, if vertex v is a node of type l yv,l = 1 v (53)
yv,l = (49) l
0, otherwise
Note that not all of the vertices are used. The
Using these definitions, they define a value v which Camarda and Maranas formulation defines dummy
assigns the correct valence value based on the type of atom types, so that the optimal molecule may
node: have fewer than the number of allowed vertices.
X Additionally, only one bond type is possible between
v = l yv,l (50)
any two vertices v and v . This is captured in:
l
X
The 0th-order connectivity index can then be defined: av,v ,b 1 v, v (54)
X X yv,l b
0
= (51)
v The valence-balance constraints again uses l,b ,
v l
which now signifies the number of external bonds of
The 1st-order connectivity index is defined similarly: type b required by a node of type l. This constraint
P ensures that if a vertex v requires two single bonds,
X X l av,v ,l there are exactly two av,v ,b=1 which are equal to
1
= (52) 1. Note that the constraint is written so that only
v v
v l the upper right triangle of the connectivity matrix is
needed. An equivalent constraint is used for b = 2.
Other TIs calculated in this fashion are provided
These constraints are provided below:
in [141, 20, 21] as well as a linearization of these
relationships. v1
X X V X
Finally, the structural feasibility constraints using av ,v,b + av,v ,b = l,b yv,l v, b (55)
this formulation begin first with an assignment v =1 v =v+1 l
constraint to ensure that each available vertex can
Finally, a constraint is used to enforce that all vertices
are connected. Specifically, this constraint ensures
Austin, Sahinidis, and Trahan 19

that for any (non-dummy) vertex v , there is at least property that the sum of the degrees of every P node
one attached vertex with a smaller index. In this way, is equal to twice the number of edges ( (v) =
the formation of two disjoint subgraphs is prevented. v
2|E|). From graph theory, we also know that the
The constraint is provided below:
number of edges in a connected graph is equal to
vX
1 X X the number of vertices minus 1 plus the dimension
av,v ,b yv ,l6=dummy v {2, . . . , V } of the cycle space, |R|, which is the number of rings
v=1 b l containing no other rings. |R| is usually intuitive from
(56) a chemical perspective and usually simply means the
number of rings (e.g. naphthalene has 2 rings because
The above captures the majority of necessary the outer cycle of all carbons does not count, and
constraints for a CAMD problem. Others are given norbornane has two rings because the cyclohexane
in [20]. We provide an example of these structural ring is bridged and does not count). In equation form,
feasibility constraints in Fig. 12. this relationship is |E| = |V | 1 + |R|. This implies
P
that (v) = 2(|V | 1 + |R|) and the first feasibility
v
3.3.3 Connectivity constraints for SD constraint follows:
methods X X X X
nd + 2 nd + 3 nd + 4 nd
Though several approaches to these constraints exist, dS1 dS2 dS3 dS4
perhaps the most intuitive for CAMD problems "" #
X 1 X X X
comes from Chemmangattuvalappil et al. [28]. These =2 nd + nd + nd + nd
constraints are more complicated than in previous 2
dD dBD1 dBD2 dBT 1
cases due to the fact that atomic signature descriptors
#
are intended to overlap. The main difficulty arises 1 + NRarom + NRali
from the need for consistency among descriptors.
For example, the simple molecule of propane should (57)
have a height-1 atomic signature of C2 (C1 )(C1 ) which We note that this feasibility constraint only requires
corresponds to the central atom. In this case, the checking signatures of a single height. In other words,
coloring scheme (represented by the superscripts) if SDs of heights 0, 1, and 2 are used, it is only
indicates how many non-hydrogen bonds an atom necessary to check this constraint for SDs of one (say,
has. Given the occurrence of one height-1 atomic height-1) of these heights.
signature of C2 (C1 )(C1 ), we would also expect two Next, the authors define a coloring sequence to
2
atomic signatures of the form C1 (C ) to exist. These be a directed connection from a parent node to one
would correspond to the height-1 signatures of the of its children. For example, propanes central atom
terminal carbons. Furthermore, the existence of would have a coloring sequence of C2 C1 which
1 1
the height-1 signature C2 (C )(C ) also requires the represents the connection from the middle atom to
presence of three height-0 signatures corresponding one of the terminal carbons. To produce a set of
to the atoms that make it up: C1 , C1 , and C2 . This is equivalent signatures, we would expect to find as
not always necessary as some QSPRs using SDs only many C1 C2 coloring sequences as there are
use atomic signatures of a certain height. In cases C2 C1 coloring sequences. This must be true for
where multiple heights are used, this consistency every signature height. These considerations can be
among different heights must be considered. formulated as linear constraints using the variables
First, Chemmangattuvalappil et al. [28] define an n. Considering d,h,o,o to be the number of coloring
index d over the set of all of the atomic signature sequences o o for an atomic signature descriptor
descriptors, D. There are also different subsets d and height h, we can define the constraint:
of atomic signature descriptors corresponding to X X
atomic signatures whose parent node forms one d,h,o,o nd = d,h,o ,o nd o, o , h (58)
d d
external bond (S1 ), two external bonds (S2 ), three
external bonds (S3 ), and four external bonds (S4 ). For coloring sequences that go from one coloring to
Furthermore, there are subsets to define signatures an identical coloring, we expect the total number
whose parent node has one double bond (BD1 ), two of these color sequences must be divisible by two.
double bonds (BD2 ), and one triple bond (BT 1 ). More formally, for a color sequence which goes from
To ensure that the graph produced is feasible, coloring o to coloring o (o o), the following must
the authors use the handshaking lemma, the graph
CAMD Review 20

apply: either by considering a small problem or by one of


X many knowledge-based reduction procedures [33, 70,
d,h,o,ond = 2K h, o (59) 71]. Of course, problems with a large design space
d are not solved efficiently with this approach. In
these cases, optimization methods have a distinct
where K is a non-negative integer. Finally,
advantage over generate-and-test procedures.
the authors introduce a constraint to ensure the
Generate-and-test algorithms are fairly intuitive.
connectivity of the graph. For an atomic signature at
The primary requirements are consideration of every
a certain height > 0, there must be a root (parent)
possible structure and low redundancy. To that
atom and children atoms. In some cases, there will
effect, several generate-and-test algorithms exist for
be multiple children of the same coloring. To produce
groups [71, 77], topological indices [92, 68], and
a connectable, consistent set of signature descriptors,
signature descriptors [46].
each of those children atoms must appear as a root
atom in another signature. This is captured by
enforcing there to be at least as many signatures with 4.2 Decomposition methods
a certain coloring of root (parent) atom as there are
atoms with that coloring in child nodes. In constraint Many CAMD problems are characterized by a large
form, this is: combinatorial space of potential descriptors, chal-
lenging non-linearities in thermodynamic of process
models, and/or high sensitivity of the objective to
X X
nd d,o nd (60)
dCo o dSoo
process and descriptor variables. As a result, many
CAMD problems are too difficult to be addressed
The set Co o is the set of all signatures who have directly as optimization problems and must instead
parent nodes of coloring o and multiple (d,o > 1) be solved as a series of optimization subproblems.
child nodes of coloring o. Additionally, d,o only Typically, these subproblems successively apply
includes vertices which have a degree greater than or increasingly difficult constraints from the original
equal to their parents. Soo is the set of signatures problem, reducing the feasible set of molecules upon
with a parent to child coloring sequence of o o . the solution of each subproblem. This step-wise
Additional information on this approach can be found reduction in problem space makes many CAMD
in [28]. Alternative approaches to the constraints problems significantly easier. Alternatively, many
and the CAMD problem using SDs are given in [29] decomposition approaches approximate a set of
and [165]. A graphical example of some of these constraints with a lower-bounding surrogate (in
constraints is provided for an example in Fig. 13. the case of minimization), devise a subproblem to
generate feasible points, and then iterate between
these subproblems until the upper and lower bounds
4 Techniques for solving the molecular on the objective function are sufficiently close.
design problem Decomposition techniques can be broadly divided
into the three categories of CAMD problems to which
4.1 Generate-and-test methods they are most often applied. These are discussed
below.
Many QSPRs, especially those often used in
CAMD, are simple functions which require little
computational effort to evaluate. Methods such as 4.2.1 Decomposition in single molecule
those discussed above are able to provide property design
estimates for millions of structures in a matter of
The most common technique in single molecule
minutes. For this reason, many CAMD approaches
design involves systematically reducing the space of
have applied QSPR models in the forward
feasible molecules by applying increasingly stringent
direction, generating a large number of candidate
constraints. For example, referring to the molecular
structures and then evaluating every property of
design formulation shown in the previous section,
interest for every molecule. This approachknown
some techniques (e.g. [71]) first apply the structural
as generate-and-testhas its merits in that it can
feasibility constraints, (23) and (22), resulting in
often optimize over a pool of molecules without
a set of all structurally feasible n vectors. This
solving a potentially difficult optimization problem.
set of n vectors represents a significantly smaller
This is especially advantageous if the pool of potential
feasible region than that of the entire space defined
molecules has been reduced to a practical number,
Austin, Sahinidis, and Trahan 21

Figure 13: An example of structural constraints using SDs for a simple alkane

Original Structure Atomic signature descriptors (height-1)


C1 C1
C3 C1 C4 C1
C3
C1
C4 C3 C1
C1 C4
C1 C3 C1 C1 C4
C1 C1
Number of occurrences (nd ) 1 2 1 3
Degree count of parent atom
3 1 4 1
((v)) :
Number of coloring sequences
(d,h=1,o,o ) :
C1 C3 0 1 0 0

C1 C4 0 0 0 1

C3 C1 2 0 0 0

C3 C4 1 0 0 0

C4 C1 0 0 3 0

C4 C3 0 0 1 0
Checking structural feasibility constraints
P P
d,h,o,o nd = d,h,o ,o nd 1 3 and 3 1 1 4 and 4 1 3 4 and 4 3
d d
P (1)(2)P= (2)(1) X P (1)(3) =
P (3)(1) X (1)(1) = (1)(1) X
nd + 2 nd + 3 nd + 4 nd
"" dS1 dS2 dS3 #dS4 #
P 1 P P P arom ali
=2 nd + 2 nd + nd + nd 1 + N R + NR
dD dBD1 dBD2 dBT 1
5 + 2(0) + 3(1) + 4(1) = 2 7 + 21 (0) + 0 + 0 1 + 0 + 0
  
12 = 12X

by the descriptors. These n vectors can be assembled 4.2.2 Decomposition in mixture design
into feasible structures, and this set of structures
can be evaluated one-by-one based on the remaining One of the more challenging problems in CAMD, the
constraints and objective function or used in another mixture design problem benefits from decomposition.
optimization problem. Early work from Gani and Fredenslund [54] proposed
This approach works best if the number of a decomposition algorithm based on the prior work
feasible structures is reduced to a reasonably small of Klein et al. [98] to decouple the mixture design
number before difficult constraints are evaluated or problem into several single-component molecular
the remaining optimization subproblem is solved. design problems. Each of these solutions could then
For this reason, decomposition methods have seen be investigated as a potential mixture component.
the most success in cases where there are few Many other approaches have followed suit [82,
possible descriptors, the constraints are very tight, 33, 4], relying on the efficiency of single-molecule
or the problem involves minimizing a distance to CAMD techniques to quickly solve single-molecule
property targets. In cases where many n vectors subproblems and optimization techniques to optimize
are possible, decomposition methods should be paired over the space of the mole fractions.
with optimization methods rather than generate-and-
test methods.
CAMD Review 22

4.2.3 Decomposition in integrated 4.3 Mathematical optimization methods


product/process design
In some cases, CAMD problems can be addressed
Integrated product/process design problems are also straightforwardly with optimization techniques. In
often decomposed. A popular method relies on others, optimization approaches become tenable with
optimizing the process variables while allowing for a slight alteration to the formulation or by exploiting
any possible properties of the products. This is a the problem structure. To reiterate an earlier
relaxation of the original problem and represents a statement, optimization approaches are best suited to
lower bound (in the case of minimization) on the problems with many possible descriptors or challeng-
objective. Solving this problem is far easier and ing non-linearities or non-convexities in molecular,
generates a set of ideal properties for an optimal mixture, or process models. In cases where there
molecule to have. Using these ideal properties as are few possible descriptors, the design space can
targets, a molecular design problem can be solved to often be enumerated efficiently using generate-and-
determine the molecule that is closest in properties test methods. Before discussing a few relevant
to these ideal values. Finally, the process design techniques, we note that there is a large degree
problem can be updated with actual property values of overlap between the categories of decomposition
of the closest structure and then re-optimized. This methods and mathematical optimization methods.
technique always produces a feasible value of the Both techniques, however, merit an independent
objective, though it may not always find the globally discussion as both play a critical role in the solution
optimal structure(s) and process variable values. of CAMD problems.
This two-stage approach has been used by Eden et Many techniques in single-molecule design are able
al. [41] and Bardow et al. [6]. The latter reference to first solve feasibility optimization problems based
has used this approach throughout the literature, on property and structural feasibility constraints.
referring to it as the continuous molecular targeting This also qualifies as a decomposition method
approach (CoMT-CAMD). as the objective function is evaluated afterwards
Other approaches optimize first in the space of or structure ranking is presumed to be done
the molecular structures before investigating process by expert analysis. Odele and Macchietto [127]
performance. Papadopoulos and Linke [129] designed introduced a general optimization formulation for
molecules based on a multi-objective optimization solving molecular design problems. The problem
problem, determining the best potential structures discussed in their paper, the single molecule design
based on a number of criteria. This resulted in problem, can be addressed very efficiently with
a Pareto-optimal set of structures, each of which optimization techniques, even if the descriptor space
could be tested in the context of the process design is large.
problem. Approaches similar in nature [82, 4] Duvedi and Achenie [38] applied outer-
have also optimized in the space of structures and approximation [37] to the formulation given by
evaluated each structure/mixture as an input to Odele and Macchieto with a few slight alterations.
process design problems. A variant of this approach was also used by Churi
Other approaches solve the problem iteratively. and Achenie [30], who considered connectivity
For example, Buxton et al. [19] proposed iteratively of the structures. In both cases, the problem
solving two subproblems: one to identify process was formulated as a mixed-integer nonlinear
conditions and the other to determine molecular program (MINLP), which was solved with an outer
structures. A new approach from Gopinath et approximation algorithm. Though efficient for
al. [60] uses the outer approximation algorithm [37] many types of problems, the outer approximation
to treat the product/process design problems as two algorithms used in these cases are not guaranteed
subproblems. In this approach, one subproblem to find globally optimal solutions if the problem is
solved the process problem for a fixed molecule and non-convex.
the other determined another candidate molecule. Sinha et al. [154] and Sahinidis et al. [146]
Approaches like these two discussed work by applied the branch-and-bound algorithm to these
generating upper and lower bounds on the objective problems. This required deriving underestimators
by solving the subproblems. When the lower and (linear in many cases) of the constraints and objective
upper bounds are within a certain tolerance, the function. Solving the problem with underestimators
algorithms terminate. resulted in a lower bound on the objective (in
the case of minimization). Then, variables in the
problem corresponding to descriptor counts, physical
Austin, Sahinidis, and Trahan 23

property values, etc. would be divided into two design component, which may require a difficult
regions, or branched on. This approach allowed simulation for each feasible structure. Perhaps
optimization strategies to quickly disregard areas of the thermodynamic functions and mole fractions
the search space which would not lead to optimal optimization must be decoupled from the molecular
solutions. Furthermore, upon convergence, the design problem, meaning that each feasible structure
branch-and-bound algorithm guarantees a globally must be investigated in a difficult mole fractions
optimal solution, at least to within a user-specified optimization subproblem.
tolerance. Heuristic approaches to complex optimization
The problem of designing molecules to match problems typically apply high-level selection strate-
property targets is far easier. This problem gies to generate a series of trial points, evaluating
can be solved in the mixed-integer quadratic the objective for each trial point and determining a
program (MIQP) shown above or as a mixed-integer new trial point based on the value of the objective as
linear program (MILP), where some transformations well as the history of function evaluations. These
of the QSPR functions must be done before algorithms terminate either by converging to a
optimization [146]. This MILP has an altered point of optimality or by exceeding some user-
objective function defined limit (time, iterations, etc.). In the context
X of CAMD problems, most heuristic optimization
min wk (d+
k + dk ) (61) approaches have optimized in the space of molecular
n,x
k structures. Practically speaking, this involves
generating a molecular structure based on certain
where d+
k and dk are the positive and negative design specifications (descriptors, property targets,
deviations between the target property values and etc.). Each structure is fixed as an input to the
the estimated values. Both d+
k and dk are modeled entire CAMD problem, and the objective value is
as positive continuous variables. They require one determined. A new structure (or series of structures)
additional constraint to calculate: is chosen for evaluation based on the relationship
between structures/descriptors/properties and the
d+ T
k dk = pk pk k (62)
objective value.
Samudra and Sahinidis [147] used this observation We note that one of the most important
to decompose the molecular design problem into considerations for using heuristics for CAMD
three basic steps: (1) identification of optimal problems is the translation of molecular structures
sets of groups, n; (2) structure generation; and into a particular encoding. In simple cases, the
(3) application of higher-order models to feasible encoding can represent the number of descriptors
structures. Zhang et al. [170] extended step used in the molecules. In other cases, the descriptors
(1) of this methodology to account for higher- may be translated into a binary string or assigned a
order groups by introducing variables to account value that is a function of the descriptors.
for the connectivity. Maranas [111] proposed
linearization of difficult property models to allow 4.4.1 Genetic algorithms
for the efficient use of MILP techniques. Camarda
and Maranas [20] convexified non-linear terms in Genetic algorithms (GAs) are a class of heuristic
their CAMD formulation, simplifying the location of algorithms roughly based around the idea of natural
globally optimal structures. selection. More specifically, GAs evaluate the
performance of each member of a certain pool of
solutions called a generation. The members of the
4.4 Heuristics generation are then combined, taking some features
Though mathematical programming approaches are from certain membersor parentsand some from
very useful, there are many CAMD problems another. The likelihood of being chosen for this
which are too high-dimensional or non-linear to be reproduction process is based on each solutions
practically considered by mathematical optimization rank, with preference given to better solutions. In
approaches. Problems of this type may attempt their application to CAMD problems, GAs typically
to design very large, structurally diverse molecules, work in the space of molecular structures. Each
necessitating a large search space with many structure in a generation is evaluated, features from
possible combinations of descriptors. Alternatively, the best-performing molecules are passed on to the
these problems may involve a complex process next generation, and the process continues until
convergence is achieved.
CAMD Review 24

Venkatasubramanian et al. [162] introduced GAs algorithm. Like other heuristics, ACO works by
to CAMD problems. They encoded molecular proposing a pool of solutions (molecules). These
structures as a string of their substituent groups and solutions are assigned a certain weight, and good
applied genetic operations to the parent strings. van solutions attract other solutions in their direction,
Dyk and Nieuwoudt [40] proposed an encoding based meaning that bad solutions become more like good
on the UNIFAC [50] groups. Xu and Diwekar [168] solutions in properties, descriptors, or otherwise.
developed a GC-based GA which encoded structures Similarly, bad solutions discourage other solutions
based on various group identities. Herring and from becoming similar to them.
Eden [72] applied GAs in conjunction with signature Simulated annealing is another heuristic to
descriptors to design structures based on property produce solutions for the CAMD problem. It works
targets. Zhou et al. [172] applied GAs to solvent by altering a given solution by randomizing the
design problems for a two-phase reactions and to descriptors that define its encoding. If the new
process design problems for gas absorption [173]. solution is better than the previous, it is accepted and
Scheffczyk et al. [150] also applied GAs to liquid- becomes the current solution. If the new solution is
liquid extraction problems based on COSMO-RS worse but still within the bounds of an error function,
thermodynamics. it is also accepted. As the algorithm proceeds, the
error function becomes more and more stringent,
meaning that worse solutions are less likely to be
4.4.2 Tabu search
accepted. Examples of this algorithm in CAMD come
Tabu search (TS) algorithms also work by first from Ourique and Telles [128] and Marcoulaki and
proposing a pool of initial solutions. These solutions Kokossis [112].
are altered by one of many operations to produce
slightly altered solutions. This process is repeated
so long as the altered molecules do not appear in
5 Applications of CAMD
a tabu list, i.e., a list of solutions forbidden from
consideration based on various factors. These factors 5.1 Single molecule design
can include: frequency of occurrence (to ensure the To begin, we mention a few approaches for
same solutions are not always visited), infeasibility single molecule design which qualify as solving the
or low-objective values (to allow the algorithm to feasibility problem. As discussed earlier, it is
determine good solutions), and many others. Using sometimes difficult to make the distinction between
this tabu list, the TS algorithm maintains some this category of problemsfeasibility problems
memory of previous solutions, which can offer and those that have some ranking criteria. The
some advantages in CAMD problems. Naturally,
simple reason for this is that many methodologies
the analogue of TS in CAMD problems maintains a are designed to do both. Nonetheless, this type
solution set of molecular structures and alters these of application is presented here to underscore its
provided they are not on the tabu list of forbidden importance. A few noteworthy examples come from
structures. One important feature of TS algorithms Joback [77] and Joback and Stephanopoulos [79,
is that the tabu list is often dynamic, meaning once 80], who describe techniques to solve the feasibility
tabu-forbidden solutions may be acceptable during a problem using groups, and Kier and co-workers [88,
later generation. 91, 92], who solved this problem using topological
Tabu search has only recently been applied to
indices, and Churchwell et al. [29], who described
CAMD problems. For example, Chavali et al. [26] techniques to solve these types of problems with
and Lin et al. [106] applied TS to the design of signature descriptors. All of these studies, though not
transition metal catalysts. Another example comes directly applying optimization, use many techniques
from McLeese et al. [117], who considered ionic liquid to reduce the massive size of the chemical search
design. space. Many techniques described in subsequent
sections can solve these feasibility problems with
4.4.3 Other methods optimization approaches. These methods typically
have an explicit objective function and thus fit better
Several other heuristics have been applied to into a different category.
CAMD problems. For example, Gebreslassie and The next category of applications concerns
Diwekar [58] solved liquid-liquid extraction problems problems with explicit objective functions. The
with a modified ant colony optimization (ACO) many objective functions of this category reflect the
Austin, Sahinidis, and Trahan 25

diversity of the CAMD field as a whole. Some of design of solvents to optimize reaction properties.
the first contributions in CAMD came from Gani Wang and Achenie [164] made one of the first efforts
and Brignole [53] and Brignole et al. [16] who to solve this problem, designing solvents to promote
designed extraction solvents based on solvent power ethanol fermentation and subsequent extraction.
and selectivity. Though Gani and Brignole did not Gani et al. [57] proposed a rules-based strategy for
use optimization in this approach, they significantly the selection of solvents for several common reactions
reduced the number of feasible molecular structures and a pharmaceutical example. Folic et al. [48, 49]
based on arguments like limiting groups, placing designed novel solvents to maximize the reaction rate
restrictions on various properties, and investigating constant of an SN 2 reaction. They applied GC
solubility curves. In this way, they narrowed the methods to estimate the parameters of the the the
massive chemical design space to a few possible solvatochromic equation of Abraham et al. [1] and
molecules which could all be considered directly by then used this equation to predict rate constants.
solving the forward problem. A similar approach Struebing et al. [158] investigated the same type of
was used by Macchietto et al. [110] and Joback [77]. reactions, now using quantum chemical calculations
Early efforts at optimization in the CAMD for their solvents and the solvatochromic equation as
problem come from Gani et al. [55] and Knight a surrogate model. Zhou et al. [171] used COSMO-
and McRae [102]. Odele and Macchietto [127] RS [95] thermodynamics in conjunction with CAMD
introduced a general numerical optimization to the techniques to design solvents to maximize reaction
CAMD problem, formulating several solvent design selectivity [171]. Austin et al. [3] designed solvents
problems as optimization problems over the space to maximize a reaction rate using COSMO-RS
of groups. Since then, many CAMD efforts have thermodynamics and projecting the original problem
focused on the design of solvents. For example, onto a lower-dimensional space.
Harper et al. [71] proposed a multi-level approach Ionic liquid design is another application area
to design separations solvents, reducing the number for this subset of CAMD problems. For example,
of feasible molecules in each of several sub-problems. Karunanithi and Mehrkesh [81] used decomposition
Sinha et al. [154] applied global optimization techniques to design ionic liquids for electrical
techniques to the design of a blanket wash solvent. conductivity, heat transfer, liquid-liquid separations,
Karunanithi et al. [82] developed a decomposition and solubility. McLeese et al. [117] designed ionic
algorithm to address difficult optimization problems, liquids using a tabu search algorithm, which they also
applying it to the design of liquid-liquid extraction showed produced a globally optimal solution for their
solvents and crystallization solvents [83]. Subsequent test problem. Matsuda et al. [116] designed ionic
investigations of this problem came from Samudra liquids based on conductivity and viscosity targets.
and Sahinidis [147] and Austin et al. [4]. These case studies used a small number of possible
Many efforts in CAMD have been applied towards groups and could thus be alternatively addressed via
industrial processes. One of the most popular exhaustive enumeration.
application areas is in designing solvents for liquid- A number of approaches have also investigated
liquid extraction. Gani et al. [55] proposed the pharmaceutical applications of CAMD. We note
design of a solvent for the separation of water and here that a good review exists on the topic
acetic acid. Harper et al. [71] and Harper and of pharmaceutical solvents already [69]. Chem-
Gani [70] sought to find a replacement for toluene mangattuvalappil et al. [28] investigated a drug
in the separation of phenol and water. A few other modification problem using molecular signatures.
examples include: Marcoulaki and Kokossis [112], Their approach, discussed above, required a fairly
Xu and Diwekar [168], Scheffczyk et al. [150], and large number of linear constraints to ensure a
Gebreslassie and Diwekar [58]. consistent set of descriptors. They applied this
Extractive distillation is also another important approach to the alteration of alkyl substituents on a
industrial process which is often studied using fungicidal compound. Siddhaye et al. [153] designed
CAMD. Many approaches investigated focused on pharmaceutical products, focusing on molecules
separations in general and have also been applied to likely to be the active pharmaceutical ingredient
liquid-liquid extraction. Two of those such studies (API). Leveraging the power of connectivity indices,
are Harper et al. [71] and Gani et al. [55]. van Dyk they were able to design a few pharmaceutically
and Nieuwoudt [40] applied simulated annealing to relevant case studies, including producing a penicillin
the design of a solvent separation of five binary pairs derivative with specified properties. Churchwell et
of common industrial compounds. al. [29] used signature descriptors to design peptide
Another emerging area of study in CAMD is the inhibitors to leukocyte functional antigen-1 (LFA-
CAMD Review 26

1) and its ligand intercellular adhesion molecule-1 given here can likely be altered to design a mixed
(ICAM-1). product for an arbitrary application.
Many CAMD efforts have also been focused on Klein et al. [98] and Gani and Fredenslund [54]
the design of alternative refrigerants. For example, first considered the mixture design problem, solving
Gani et al. [55] designed refrigerants based on a few a few example problems involving solubilities and
important properties. Joback [77] first considered compounds which form azeotropes. Vaidyanathan
the problem of finding a replacement refrigerant and El-Halwagi [160] designed blends of polymers,
for Freon-12. Duvedi and Achenie [38, 39] and relying on simple mixing rules for algebraic simplicity.
Churi and Achenie [30] looked at this same problem, Vaidyanathan and El-Halwagi also designed single
focusing on heat capacity and heat of vaporization polymers. Duvedi and Achenie [39] developed an
as the properties to optimize. Marcoulaki and MINLP formulation for mixture design, using an
Kokossis [112] also designed a replacement for Freon- equation of state to estimate some mixture properties
12 using a simulated annealing approach. Sahinidis et and relative simple mixing rules to estimate others.
al. [146] investigated the same replacement problem They applied the methodology to the design of
using a global optimization approach with modified refrigerant blends.
structural constraints and an improved CAMD Buxton et al. [19] proposed a decomposition
formulation. Samudra and Sahinidis [148] designed technique to solve mixture design problems. They
heat transfer fluids for refrigeration systems. produced solvent blends which would reduce the
Polymer design has been investigated extensively environmental impact of an industrial process. Sinha
using CAMD techniques. Many of these approaches et al. [155] solved mixture selection problems as an
focused on designing a polymer with certain physical MINLP, choosing the best combination of solvents
properties. Venkatasubramanian et al. [161, 162] from a given list. This approach was able to select
applied genetic algorithms to the problem, designing an optimal single solvent and mixture of solvents for
polymers to approximate target values in various use as cleaning agents in the lithographic printing
property categories like glass transition temperature, industry.
bulk modulus, heat capacity, density, and others. Karunanithi et al. [83] used a decomposition
Maranas [111] approaches the problem in a similar technique to first reduce the mixture design problem
way, minimizing distances to target values. Unlike to the set of all feasible individual components.
Venkatasubramanian et al., Maranas used a mixed- Each possible mixture was then used to evaluate
integer linear program (MILP) formulation and the objective function. This approach was used
mathematical optimization techniques to solve the for the design of a crystallization solvent and anti-
problem. Camarda and Maranas [20] addressed solvent. Conte et al. [33] proposed a task-based
the design problem using topological indices and a decomposition algorithm, which was applied to the
mathematical optimization formulation. Eslick et design of paint solvents blends and insect repellent
al. [42] also used topological indices but designed solvent blends. Austin et al. [4] addressed the
molecules with a tabu search algorithm rather than mixture design problem in the reduced-order space
a mathematical optimization approach. Brown of each individual components properties, employing
et al. [17] considered the problem with signature derivative-free optimization methods to optimize over
descriptors. Pavurala and Achenie [136] used an the lower-dimensional space. This was applied to
outer-approximation approach to design polymers to reproduce the crystallization solvent design problems
aid in oral drug delivery. of Karunanithi et al. [83].
This section presents a selection of main
applications of single-molecule design problems. For
5.3 Integrated process and product design
a more comprehensive list, see Table 3.
Though many CAMD endeavors design products
5.2 Mixture design with the ultimate goal of being incorporated into
an industrial process, few have explicitly considered
The mixture design problem is a difficult variant of the relationship between a particular structure and
the single-molecule design problem. As a result, there a process. This can be problematic as many
are far fewer examples of applications considering a recent efforts have observed some sensitivity between
mixed product. Though we divide by application product descriptor variables and process variables.
in this section, we emphasize that many of these To overcome this issue, various approaches have
techniques are generalizable. Many of the references considered the product and process design problems
simultaneously.
Austin, Sahinidis, and Trahan 27

Table 3: Summary of CAMD applications and the methodologies used in each case

Application References
Antigen inhibition activity Churchwell et al. sd,d,o [29]
Biodiesel additives Hada et al. gc,d,gt [62]
Gani et al. gc,d,gt,m [55], Bardow et al. sel,d,p [6], Pereira et
CO2 capture al. gc,o,p [138], Stavrou et al. sel,d,o,p [156], Burger et al. gc,o,p [18],
Lampe et al. gc,d,o,p [105], Gopinath et al. gc,d,o,p [60]
Karunanithi et al. gc,d,gt,p,m [83], Samudra and
Crystallization solvents
Sahinidis gc,d,o [147], Austin et al.gc,d,o,p,m [4]
Harper et al. gc,d,gt [71], Gani et al. gc,d,gt [55], Papadopoulos
Extractive distillation
and Linke gc,d,h,p [131], van Dyk and Nieuwoudt gc,d,h [40]
Wang and Achenie gc,d,o [164], Papadopoulos and
Extractive fermentation
Linke gc,d,h,p [129]
Odele and Macchietto gc,d,o [127], Buxton et al. gc,d,o,m,p [19],
Gas absorption Papadopoulos and Linke gc,d,h,p [130, 131], Bommareddy et
al. gc,d,gt,p [14], Zhou et al. gc,d,h,p [173]
HIV-1 protease inhibition sd,d,o
Visco et al. [163]
activity
Matsuda et al. gc,gt [116], McLeese et al. ti,h [117], Karunanithi
Ionic liquids design
and Mehrkesh gc,d,h [81]
Gani and Brignole gc,d,gt [53], Brignole et al. gc,d,gt [16], Odele
and Macchietto gc,d,o [127], Marcoulaki and Kokossis gc,d,h [112],
Harper et al. gc,d,gt [71], Harper and Gani gc,d,gt [70], Gani et
al. gc,d,gt [55], Karunanithi et al. gc,d,gt,p [82], Austin et
Liquid-liquid extraction
al.gc,d,o,p,m [3], Ourique and Telles gc,h [128], Kim and
Diwekar gc,d,h,p [93], Papadopoulos and Linke gc,d,h,p [130], Xu
and Diwekar gc,d,h [168], Scheffczyk et al. d,o [150], Gebreslassie
and Diwekar gc,h [58]
Papadopoulos et al. gc,d,h,p [132], Lampe et al. sel,d,p [104],
Organic Rankine cycle fluids
Lampe et al. gc,d,o,p [105]
Pharmaceutical products Siddhaye et al. ti,o [153]
Venkatasubramanian et al. gc,h [161, 162], Maranas gc,o [111],
Vaidyanathan and El-Halwagi gc,o,m [160], Camarda and
Polymer design
Maranas ti,o [20], Brown et al. sd,o [17], Eslick et al. ti,h [42],
Pavurala and Achenie gc,d,o [136], Zhang et al. gc,o [170]
Wang and Achenie gc,d,o [164], Gani et al. sel,d [57], Folic et
Reactions solvents al. gc,o [48, 49], Struebing et al. gc,d,o [158], Zhou et al. gc,o [171],
Austin et al.gc,d,o,m [3], Zhou et al. gc,d,h [172]
Joback gc,d,gt [77], Gani et al. gc,d,gt [55], Churi and
Achenie gc,d,o [30], Duvedi and Achenie gc,d,o [38, 39],
Refrigerant design
Marcoulaki and Kokossis gc,d,h [112], Sahinidis et al. gc,o [146],
Ourique and Telles gc,h [128], Samudra and Sahinidis gc,d,o [148]
Separations (general) Hostrup et al. gc,sel,d,o,gt,p [75]
Pistikopoulos and Stefanis gc,o [139], Buxton et al. gc,d,o,m,p [19],
Solvents for consumer products
Conte et al. gc,d,gt,m [33], Sinha et al. gc,o [154], Sinha et
and industry
al. sel,o [155], Weis and Visco sd,o [165]
Soybean oil products Camarda and Sunderesan ti,o [21]
Structural modifications to a Raman and Maranas ti,o [141], Chemmangattuvalappil et
fungicide al. sd,o [28]
Transition metal catalyst design Chavali et al. ti,h [26], Lin et al. ti,h [106]
VOC recovery Eden et al. gc,d,gt,p [41]
gc
Group contribution methods are the main QSPR method used
ti
Topological indices are the main QSPR method used
sd
Signature descriptors are the main QSPR method used
sel d
Compounds are selected from a fixed list rather than designed Decomposition methods used
gt o
Generate-and-test procedure used Numerical optimization used
h m p
Heuristical optimization used Mixture design considered Process design considered
CAMD Review 28

Eden et al. [41] solved an integrated process state to design a carbon capture and storage process.
and product design problem to best recover volatile Stavrou et al. [156] used the same approach to
organic compounds from an industrial process, consider carbon capture problems. Pereira et al. [138]
identifying optimal property targets for solvents also used SAFT to optimize a process to separate
in a reduced-order space given by clustering carbon dioxide and methane at high pressures.
methods. Hostrup et al. [75] proposed a general Papadopoulos et al. [132] designed fluids for an
framework for integrated process and product design organic Rankine cycle, considering fluids which fell on
that was focused on separations. This method the Pareto front of optimal properties for the process.
relied on reduction of the feasible solution space Lampe et al. considered the same problem for fluid
via thermodynamic arguments and case-specific selection [104] and fluid design [105]. A summary of
considerations. Then, molecules were designed based the references, categorized by application, is provided
on proximity to property targets for a certain process in Table 3.
architecture.
Kim and Diwekar [94] solved liquid-liquid ex-
traction process problems, considering the process 6 Conclusions
performance and designing suitable structures using
a heuristic optimization strategy for the generation The advent of the computational age has drastically
of solvent structures. Papadopoulos and Linke [129] impacted the design of chemical products and
developed a methodology for considering integrated novel molecules, altering a once intuition-based,
problems which relied on decomposing the problems trial-and-error practice into a rapid and efficient
into product and process subproblems. Unlike search through millions of possible structures. The
other approaches, Papadopoulos and Linke solved availability and accuracy of QSPRs combined with
multi-objective optimization problems, determining efficient mathematical programming techniques has
the Pareto optimal front for solvent properties extended this capability even further, enabling
likely to be related to process performance. Using chemical product designers to investigate a previously
these Pareto-optimal structures, they could solve unimaginable diversity of chemical structures.
the process problems for a much smaller set of This article has provided background on the
possible molecules. They applied this methodology QSPRs which often serve as the underpinning of
to extractive fermentation [129] and liquid-liquid CAMD problems. Each of three methods (group
extraction and gas absorption [130, 131]. contribution, topological indices, and signature
Karunanithi et al. [82] proposed a decomposition descriptors) was discussed in detail, and relevant
methodology to solve difficult process design constraints for optimization problems were provided
problems. This methodology first filtered out a in each case. The CAMD problem was also
large number of possible molecular structures based addressed from the vantage point of mathematical
on property bounds. It then applied a few stages optimization. Various formulations were discussed
of more complicated constraints to the remaining for a few broad classes of the CAMD problem
molecules, further reducing the pool of feasible (single-molecule design, mixture design, integrated
structures. Finally, the process model was applied product/process design). Solution techniques were
to each of the molecules which were feasible for discussed to aid in the solution of the often difficult
all of the constraints. This methodology was CAMD problem. Finally, we provided a summary of
applied to the design of a liquid-liquid extraction the many design endeavors and applications of the
process [82] and to the design of crystallization CAMD problem.
solvents [83]. Bommareddy et al. [14] addressed The increasing availability of computational
the product/process design problem first in the resources, efficient optimization algorithms, and
space of the process, finding ranges of properties for accurate QSPRs bodes well for the future of CAMD.
the molecules to be designed. These ranges then CAMD has a long history of proposing improved
represented what was most suitable for a particular solutions for many well-known industrial processes as
process and therefore defined a much smaller search well as designing new products for consumers and
space for the molecular design subproblem. optimizing high-impact chemical processes. More
Bardow et al. [6] proposed the CoMT-CAMD recently, there have been a growing number of more
approach to first identify target solvent properties exotic modeling and design efforts, concerning ideas
and then select an optimal solvent based on proximity such as integrating quantum chemistry techniques,
to these ideal solvent properties. This was applied designing transition metal catalysts, and determining
in conjunction to a variant of the SAFT equation of optimal structures of pharmaceutical compounds.
Austin, Sahinidis, and Trahan 29

The potential applications of CAMD are numerous, physicochemical and topological parameters.
and the field is poised to play an integral role in Environmental Toxicology and Chemistry, 3:
the development of the chemical and biochemical 191199, 1984.
technologies of the not-so-distant future.
[9] S. W. Benson. New methods for estimating
the heats of formation, heat capacities, and
References entropies of liquids and gases. The Journal of
Physical Chemistry A, 103:1148111485, 1999.
[1] M. H. Abraham, R. M. Doherty, M. J. Kamlet,
J. M. Harris, and R. W. Taft. Linear solvation [10] S. W. Benson and J. H. Buss. Additivity
energy relationships. Part 37. An analysis rules for the estimation of molecular properties.
of contributions of dipolaritypolarisability, Thermodynamic properties. The Journal of
nucleophilic assistance, electrophilic assistance, Chemical Physics, 29:546572, 1958.
and cavity terms to solvent effects on t-butyl
halide solvolysis rates. Journal of the Chemical [11] S. W. Benson, F. R. Cruickshank, D. M.
Society, Perkin Transactions 2, pages 913920, Golden, G. R. Haugen, H. E. Oneal, A. S.
1987. Rodgers, R. Shaw, and R. Walsh. Additivity
rules for the estimation of thermochemical
[2] D. S. Abrams and J. M. Prausnitz. Statistical properties. Chemical Reviews, 69:279324,
thermodynamics of liquid mixtures: A new 1969.
expression for the excess Gibbs energy of
partly or completely miscible systems. AIChE [12] R. S. Boethling. Application of molecular
Journal, 21:116128, 1975. topology to quantitative structure-
biodegradability relationships. Environmental
[3] N. D. Austin, N. V. Sahinidis, and D. W. Tra- Toxicology and Chemistry, 5:797806, 1986.
han. A COSMO-based approach to computer-
aided mixture design. Chemical Engineering [13] R. S. Bohacek, C. McMartin, and W. C. Guida.
Science, 2016. DOI 10.1016/j.ces.2016.05.025. The art and practice of structure-based drug
design: A molecular modeling perspective.
[4] N. D. Austin, A. P. Samudra, N. V. Sahinidis, Medicinal research reviews, 16:350, 1996.
and D. W. Trahan. Mixture design using
derivative-free optimization in the space of [14] S. Bommareddy, N. G. Chemmangattuvalap-
individual component properties. AIChE pil, C. C. Solvason, and M. R. Eden.
Journal, 62:15141530, 2016. Simultaneous solution of process and molecular
design problems using an algebraic approach.
[5] S. Bajaj, S. S. Sambi, and A. K. Madan. Computers & Chemical Engineering, 34:1481
Prediction of anti-inflammatory activity of N- 1486, 2010.
arylanthranilic acids: Computational approach
using refined Zagreb indices. Croatica Chemica [15] D. Bonchev. Chemical graph theory: Introduc-
Acta, 78:165174, 2005. tion and fundamentals, volume 1. CRC Press,
1991.
[6] A. Bardow, K. Steur, and J. Gross.
Continuous-molecular targeting for integrated [16] E. A. Brignole, S. B. Bottini, and R. Gani.
solvent and process design. Industrial & A strategy for the design and selection of
Engineering Chemistry Research, 49:2834 solvents for separation processes. Fluid Phase
2840, 2010. Equilibria, 29:125132, 1986.

[7] S. C. Basak. Use of molecular complexity [17] W. M. Brown, S. Martin, M. D. Rintoul,


indices in predictive pharmacology and toxi- and J.-L. Faulon. Designing novel polymers
cology: A QSAR approach. Medical Science with targeted properties using the signature
Research, 15:605609, 1987. molecular descriptor. Journal of chemical
information and modeling, 46:826835, 2006.
[8] S. C. Basak, D. P. Gieschen, and V. R. Mag-
nuson. A quantitative correlation of the LC50 [18] J. Burger, V. Papaioannou, S. Gopinath,
values of esters in Pimephales promelas using G. Jackson, A. Galindo, and C. S. Adjiman.
A hierarchical method to integrated solvent
CAMD Review 30

and process design of physical CO2 absorption [28] N. G. Chemmangattuvalappil, C. C. Solvason,


using the SAFT- Mie approach. AIChE S. Bommareddy, and M. R. Eden. Reverse
Journal, 61:32493269, 2015. problem formulation approach to molecular
design using property operators based on
[19] A. Buxton, A. G. Livingston, and E. N. signature descriptors. Computers & Chemical
Pistikopoulos. Optimal design of solvent Engineering, 34:20622071, 2010.
blends for environmental impact minimization.
AIChE Journal, 45:817843, 1999. [29] C. J. Churchwell, M. D. Rintoul, S. Martin,
D. P. Visco, A. Kotu, R. S. Larson, L. O.
[20] K. V. Camarda and C. D. Maranas. Opti- Sillerud, D. C. Brown, and J.-L. Faulon. The
mization in polymer design using connectivity signature molecular descriptor: 3. Inverse-
indices. Industrial & Engineering Chemistry quantitative structureactivity relationship of
Research, 38:18841892, 1999. ICAM-1 inhibitory peptides. Journal of
Molecular Graphics and Modelling, 22:263273,
[21] K. V. Camarda and P. Sunderesan. An 2004.
optimization approach to the design of value-
added soybean oil products. Industrial [30] N. Churi and L. E. K. Achenie. Novel
& Engineering Chemistry Research, 44:4361 mathematical programming model for com-
4367, 2005. puter aided molecular design. Industrial
& Engineering Chemistry Research, 35:3788
[22] W. Cao, K. Knudsen, A. Fredenslund, and 3794, 1996.
P. Rasmussen. Group-contribution viscosity
predictions of liquid mixtures using UNIFAC- [31] N. Cohen and S. W. Benson. Estimation of
VLE parameters. Industrial & engineering heats of formation of organic compounds by
chemistry research, 32:20882092, 1993. additivity methods. Chemical Reviews, 93:
24192438, 1993.
[23] R. Ceriani, C. B. Goncalves, J. Rabelo,
M. Caruso, A. C. C. Cunha, F. W. Cavaleri, [32] L. Constantinou and R. Gani. New group
E. A. C. Batista, and A. J. A. Meirelles. Group contribution method for estimating properties
contribution model for predicting viscosity of of pure compounds. AIChE Journal, 40:1697
fatty compounds. Journal of Chemical & 1710, 1994.
Engineering Data, 52:965972, 2007.
[33] E. Conte, R. Gani, and K. M. Ng. Design of
[24] R. Ceriani, R. Gani, and A. J. A. Meirelles. formulated products: A systematic methodol-
Prediction of heat capacities and heats of ogy. AIChE Journal, 57:24312449, 2011.
vaporization of organic liquids by group
contribution methods. Fluid Phase Equilibria, [34] E. L. Cussler and G. D. Moggridge. Chemical
283:4955, 2009. Product Design. Cambridge University Press,
2nd edition, 2011.
[25] W. G. Chapman, K. E. Gubbins, G. Jackson,
and M. Radosz. SAFT: Equation-of-state [35] J. Devillers and A. T. Balaban. Topological
solution model for associating fluids. Fluid indices and related descriptors in QSAR and
Phase Equilibria, 52:3138, 1989. QSPAR. CRC Press, 2000.

[26] S. Chavali, B. Lin, D. C. Miller, and K. V. [36] E. S. Domalski and E. D. Hearing. Estimation
Camarda. Environmentally-benign transition of the thermodynamic properties of hydrocar-
metal catalyst design using optimization tech- bons at 298.15 K. Journal of Physical and
niques. Computers & Chemical Engineering, Chemical Reference Data, 17:16371678, 1988.
28:605611, 2004.
[37] M. A. Duran and I. E. Grossmann. An outer-
[27] American Chemical Society. approximation algorithm for a class of mixed-
CAS Registry. Available at integer nonlinear programs. Mathematical
www.cas.org/content/chemical-substances. Programming, 36:307339, 1986.
Austin, Sahinidis, and Trahan 31

[38] A. P. Duvedi and L. E. K. Achenie. Information and Computer Sciences, 43:707


Designing environmentally safe refrigerants 720, 2003.
using mathematical programming. Chemical
Engineering Science, 51:37273739, 1996. [48] M. Folic, C. S. Adjiman, and E. N.
Pistikopoulos. Design of solvents for optimal
[39] A. P. Duvedi and L. E. K. Achenie. On the reaction rate constants. AIChE Journal, 53:
design of environmentally benign refrigerant 12401256, 2007.
mixtures. A mathematical programming ap-
proach. Computers & Chemical Engineering, [49] M. Folic, C. S. Adjiman, and E. N.
21:915923, 1997. Pistikopoulos. Computer-aided solvent design
for reactions: Maximizing product formation.
[40] B. Van Dyk and I. Nieuwoudt. Design of Industrial & Engineering Chemistry Research,
solvents for extractive distillation. Industrial 47:51905202, 2008.
& Engineering Chemistry Research, 39:1423
1429, 2000. [50] A. Fredenslund, R. L. Jones, and J. M.
Prausnitz. Group-contribution estimates of
[41] M. R. Eden, S. B. Jrgensen, R. Gani, and activity coefficients in nonideal liquid mixtures.
M. M. El-Halwagi. A novel framework for AIChE Journal, 21:1086, 1975.
simultaneous separation process and product
design. Chemical Engineering and Processing: [51] J. Galvez, R. Garcia, M. T. Salabert, and
Process Intensification, 43:595608, 2004. R. Soler. Charge indexes. New topological
descriptors. Journal of Chemical Information
[42] J. C. Eslick, Q. Ye, J. Park, E. M. Topp, and Computer Sciences, 34:520525, 1994.
P. Spencer, and K. V. Camarda. A
computational molecular design framework for [52] R. Gani. Computer-aided methods and
crosslinked polymer networks. Computers & tools for chemical product design. Chemical
Chemical Engineering, 33:954963, 2009. Engineering Research and Design, 82:1494
1504, 2004.
[43] E. Estrada. Edge adjacency relationships and
a novel topological index related to molecular [53] R. Gani and E. A. Brignole. Molecular
volume. Journal of Chemical Information and design of solvents for liquid extraction based on
Computer Sciences, 35:3133, 1995. UNIFAC. Fluid Phase Equilibria, 13:331340,
1983.
[44] E. Estrada and L. Rodrguez. Edge-
connectivity indices in QSPR/QSAR studies. [54] R. Gani and A. Fredenslund. Computer-aided
1. Comparison to other topological indices molecular and mixture design with specified
in QSPR studies. Journal of Chemical constraints. Fluid Phase Equilibria, 82:3946,
Information and Computer Sciences, 39:1037 1993.
1041, 1999.
[55] R. Gani, B. Nielsen, and A. Fredenslund.
[45] E. Estrada and E. Uriarte. Recent advances on A group contribution approach to computer-
the role of topological indices in drug discovery aided molecular design. AIChE Journal, 37:
research. Current Medicinal Chemistry, 8: 13181332, 1991.
15731588, 2001.
[56] R. Gani, P. M. Harper, and M. Hostrup.
[46] J.-L. Faulon, C. J. Churchwell, and D. P. Automatic creation of missing groups through
Visco. The signature molecular descriptor. connectivity index for pure-component prop-
2. Enumerating molecules from their extended erty prediction. Industrial & Engineering
valence sequences. Journal of Chemical Chemistry Research, 44:72627269, 2005.
Information and Computer Sciences, 43:721
734, 2003. [57] R. Gani, C. Jimenez-Gonzalez, and D. J. C.
Constable. Method for selection of solvents for
[47] J.-L. Faulon, D. P. Visco, and R. S. promotion of organic reactions. Computers &
Pophale. The signature molecular descriptor. Chemical Engineering, 29:16611676, 2005.
1. Using extended valence sequences in QSAR
and QSPR studies. Journal of Chemical
CAMD Review 32

[58] B. H. Gebreslassie and U. M. Diwekar. Efficient [67] L. H. Hall, E. L Maynard, and L. B.


ant colony optimization for computer aided Kier. Structureactivity relationship studies
molecular design: Case study solvent selection on the toxicity of benzene derivatives: III.
problem. Computers & Chemical Engineering, Predictions and extension to new substituents.
78:19, 2015. Environmental Toxicology and Chemistry, 8:
431436, 1989.
[59] F. Gharagheizi, M. R. S. Gohar, and M. G.
Vayeghan. A quantitative structureproperty [68] L. H. Hall, R. S. Dailey, and L. B. Kier. Design
relationship for determination of enthalpy of of molecules from quantitative structure-
fusion of pure compounds. Journal of Journal activity relationship models. 3. Role of higher
of thermal analysis and calorimetry, 109:501 order path counts: Path 3. Journal of Chemical
506, 2012. Information and Computer Sciences, 33:598
603, 1993.
[60] S. Gopinath, G. Jackson, A. Galindo, and
C. S. Adjiman. Outer approximation algorithm [69] M. Harini, J. Adhikari, and K. Y. Rani. A
with physical domain reduction for computer- review of property estimation methods and
aided molecular and separation process design. computational schemes for rational solvent
AIChE Journal, 2016. design: A focus on pharmaceuticals. Industrial
& Engineering Chemistry Research, 52:6869
[61] S. Gupta, M. Singh, and A. K. Madan. 6893, 2013.
Application of graph theory: Relationship
of eccentric connectivity index and Wieners [70] P. M. Harper and R. Gani. A multi-step
index with anti-inflammatory activity. Journal and multi-level approach for computer aided
of Mathematical Analysis and Applications, molecular design. Computers & Chemical
266:259268, 2002. Engineering, 24:677683, 2000.

[62] S. Hada, C. C. Solvason, and M. R. [71] P. M. Harper, R. Gani, P. Kolar, and


Eden. Characterization-based molecular design T. Ishikawa. Computer-aided molecular design
of bio-fuel additives using chemometric and with combined molecular modeling and group
property clustering techniques. Frontiers in contribution. Fluid Phase Equilibria, 158160:
Energy Research, 2:20, 2014. 337347, 1999.

[63] L. H. Hall and L. B. Kier. The molecular [72] R. H. Herring and M. R. Eden. Evolutionary
connectivity chi indexes and kappa shape in- algorithm for de novo molecular design with
dexes in structure-property modeling. Reviews multi-dimensional constraints. Computers &
in Computational Chemistry, Volume 2, pages Chemical Engineering, 83:267277, 2015.
367422, 2007.
[73] H. Hosoya. Topological index. A newly
[64] L. H. Hall and C. T. Story. Boiling point and proposed quantity characterizing the topolog-
critical temperature of a heterogeneous data ical nature of structural isomers of saturated
set: QSAR with atom type electrotopological hydrocarbons. Bulletin of the Chemical Society
state indices using artificial neural networks. of Japan, 44:23322339, 1971.
Journal of Chemical Information and Com-
puter Sciences, 36:10041014, 1996. [74] H. Hosoya, K. Hosoi, and I. Gutman. A
topological index for the total-electron energy.
[65] L. H. Hall, L. B. Kier, and W. J. Murray. Theoretica Chimica Acta, 38:3747, 1975.
Molecular connectivity II: Relationship to
[75] M. Hostrup, P. M., and R. Gani. Design of
water solubility and boiling point. Journal of
environmentally benign processes: Integration
Pharmaceutical Sciences, 64:19741977, 1975.
of design and separation process synthesis.
[66] L. H. Hall, E. L. Maynard, and L. B. Kier. Computers & Chemical Engineering, 23:1395
QSAR investigation of benzene toxicity to 1414, 1999.
fathead minnow using molecular connectivity.
[76] J. W. Jalowka and T. E. Daubert. Group
Environmental Toxicology and Chemistry, 8:
contribution method to predict critical temper-
783788, 1989.
ature and pressure of hydrocarbons. Industrial
Austin, Sahinidis, and Trahan 33

& Engineering Chemistry Process Design and Chemical Information and Computer Sciences,
Development, 25:139142, 1986. 38:720725, 1998.

[77] K. G. Joback. Designing molecules possessing [86] G. W. Kauffman and P. C. Jurs. Prediction
desired physical property values. PhD thesis, of surface tension, viscosity, and thermal
Department of Chemical Engineering, Mas- conductivity for common organic solvents using
sachusetts Institute of Technology, Cambridge, quantitative structure-property relationships.
MA, 1989. Journal of Chemical Information and Com-
puter Sciences, 41:408418, 2001.
[78] K. G. Joback and R. C. Reid. Esti-
mation of pure-component properties from [87] L. B. Kier and L. H. Hall. Molecular connec-
group contributions. Chemical Engineering tivity VII: specific treatment of heteroatoms.
Communications, 57:233243, 1987. Journal of Pharmaceutical Sciences, 65:1806
1809, 1976.
[79] K. G. Joback and G. Stephanopoulos. De-
signing molecules possessing desired physical [88] L. B. Kier and L. H. Hall. The generation
property values. Proceedings of the 1989 of molecular structures from a graph-based
Foundations of Computer-Aided Process De- QSAR Equation. Quantitative Structure-
sign Conference, Snowmass, CO, Elsevier, Activity Relationships, 12:383388, 1993.
Amsterdam, pages 195230, 1990.
[89] L. B. Kier, L. H. Hall, W. J. Murray,
[80] K. G. Joback and G. Stephanopoulos. Search- and M. Randic. Molecular connectivity I:
ing spaces of discrete solutions: The design of Relationship to nonspecific local anesthesia.
molecules possessing desired physical proper- Journal of Pharmaceutical Sciences, 64:1971
ties. Advances in Chemical Engineering, 21: 1974, 1975.
257311, 1995.
[90] L. B. Kier, W. J. Murray, M. Randic, and L. H.
[81] A. Karunanithi and A. Mehrkesh. Computer- Hall. Molecular connectivity V: connectivity
aided design of tailor-made ionic liquids. series concept applied to density. Journal of
AIChE Journal, 59:46274640, 2013. Pharmaceutical Sciences, 65:12261230, 1976.
[82] A. T. Karunanithi, L. E. K. Achenie, [91] L. B. Kier, L. H. Hall, and R. S. Dailey. Design
and R. Gani. A new decomposition- of molecules from quantitative structure-
based computer-aided molecular/mixture de- activity relationship models. 3. Role of higher
sign methodology for the design of optimal order path counts: Path 3. Journal of Chemical
solvents and solvent mixtures. Industrial Information and Computer Sciences, 33:598
& Engineering Chemistry Research, 44:4785 603, 1993.
4797, 2005.
[92] L. B. Kier, L. H. Hall, and J. W. Frazer. Design
[83] A. T. Karunanithi, L. E. K. Achenie, and of molecules from quantitative structure-
R. Gani. A computer-aided molecular design activity relationship models 1. Information
framework for crystallization solvent design. transfer between path and vertex degree
Chemical Engineering Science, 61:12471260, counts. Journal of Chemical Information and
2006. Computer Sciences, 33:143147, 1993.
[84] A. R. Katritzky and E. V. Gordeeva. [93] K. Kim and U. M. Diwekar. Efficient
Traditional topological indexes vs electronic, combinatorial optimization under uncertainty.
geometrical, and combined molecular descrip- 2. Application to stochastic solvent selection.
tors in QSAR/QSPR research. Journal of Industrial & Engineering Chemistry Research,
Chemical Information and Computer Sciences, 41:12851296, 2002.
33:835857, 1993.
[94] K.-J. Kim and U. M. Diwekar. Integrated
[85] A. R. Katritzky, Y. Wang, S. Sild, T. Tamm, solvent selection and recycling for continuous
and M. Karelson. QSPR studies on vapor processes. Industrial & Engineering Chemistry
pressure, aqueous solubility, and the prediction Research, 41:44794488, 2002.
of water-air partition coefficients. Journal of
CAMD Review 34

[95] A. Klamt. Conductor-like screening model for & Engineering Chemistry Research, 53:8821
real solvents: A new approach to the quan- 8830, 2014.
titative calculation of solvation phenomena.
The Journal of Physical Chemistry A, 99:2224 [105] M. Lampe, M. Stavrou, J. Schilling, E. Sauer,
2235, 1995. J. Gross, and A. Bardow. Computer-aided
molecular design in the continuous-molecular
[96] A. Klamt and G. Schuurmann. COSMO: A targeting framework using group-contribution
new approach to dielectric screening in solvents PC-SAFT. Computers & Chemical Engineer-
with explicit expressions for the screening ing, 81:278287, 2015.
energy and its gradient. Journal of the
Chemical Society, Perkin Transactions, 2:799 [106] B. Lin, S. Chavali, K. Camarda, and D. C.
805, 1993. Miller. Computer-aided molecular design
using tabu search. Computers & Chemical
[97] A. Klamt, V. Jonas, T. Burger, and J. C. W. Engineering, 29:337347, 2005.
Lohrenz. Refinement and parametrization
of COSMO-RS. The Journal of Physical [107] S.-T. Lin and S. I. Sandler. A priori
Chemistry A, 102:50745085, 1998. phase equilibrium prediction from a segment
contribution solvation model. Industrial &
[98] J. A. Klein, D. T. Wu, and R. Gani. Engineering Chemistry Research, 41:899913,
Computer aided mixture design with specified 2002.
property constraints. Computers & Chemical
Engineering, 16:S229S236, 1992. [108] A. Lymperiadis, C. S. Adjiman, A. Galindo,
and G. Jackson. A group contribution method
[99] K. M. Klincewicz and R. C. Reid. Estimation for associating chain molecules based on the
of critical properties with group contribution statistical associating fluid theory (SAFT-).
methods. AIChE Journal, 30:137142, 1984. The Journal of Chemical Physics, 127:234903,
2007.
[100] G. Klopman and H. Zhu. Estimation of
the aqueous solubility of organic molecules by [109] A. Lymperiadis, C. S. Adjiman, G. Jackson,
the group contribution approach. Journal of and A. Galindo. A generalisation of the
Chemical Information and Computer Sciences, saft-group contribution method for groups
41:439445, 2001. comprising multiple spherical segments. Fluid
Phase Equilibria, 274:85104, 2008.
[101] G. Klopman, J.-Y. Li, S. Wang, and
M. Dimayuga. Computer automated log [110] S. Macchietto, O. Odele, and O. Omatsone.
P calculations based on an extended group Design on optimal solvents for liquid-liquid
contribution approach. Journal of Chemical extraction and gas absorption processes.
Information and Computer Sciences, 34:752 Chemical Engineering Research and Design, 68:
781, 1994. 429433, 1990.

[102] J. P. Knight and G. J. McRae. A combinatorial [111] C. D. Maranas. Optimal computer-aided


optimization approach to molecular design. molecular design: A polymer design case study.
Nanotechnology, 2:142148, 1991. Industrial & Engineering Chemistry Research,
35:34033414, 1996.
[103] Z. Kolska, J. Kukal, M. Zabransky, and
V. Ruzicka. Estimation of the heat capacity [112] E. C. Marcoulaki and A. C. Kokossis.
of organic liquids as a function of temperature Molecular design synthesis using stochastic
by a three-level group contribution method. optimisation as a tool for scoping and
Industrial & Engineering Chemistry Research, screening. Computers & Chemical Engineering,
47:20752085, 2008. 22:S11S18, 1998.

[104] M. Lampe, M. Stavrou, H. M. Bucker, J. Gross, [113] J. Marrero and R. Gani. Group-contribution
and A. Bardow. Simultaneous optimization based estimation of pure component properties.
of working fluid and process for organic Fluid Phase Equilibria, 183184:183208, 2001.
Rankine cycles using PC-SAFT. Industrial
Austin, Sahinidis, and Trahan 35

[114] J. Marrero and R. Gani. Group-contribution [123] Y. Nannoolal, J. Rarey, and D. Ramjugernath.
based estimation of octanol/water partition Estimation of pure component properties: Part
coefficient and aqueous stability. Industrial 2. Estimation of critical property data by group
& Engineering Chemistry Research, 41:6623 contribution. Fluid Phase Equilibria, 252:127,
6633, 2002. 2007.

[115] T. M. Martin and D. M. Young. Prediction [124] Y. Nannoolal, J. Rarey, and D. Ramjugernath.
of the acute toxicity (96-h LC50) of organic Estimation of pure component properties:
compounds to the fathead minnow (Pimephales Part 3. Estimation of the vapor pressure of
promelas) using a group contribution method. non-electrolyte organic compounds via group
Chemical Research in Toxicology, 14:1378 contributions and group interactions. Fluid
1385, 2001. Phase Equilibria, 269:117133, 2008.

[116] H. Matsuda, H. Yamamoto, K. Kurihara, and [125] Y. Nannoolal, J. Rarey, and D. Ramjugernath.
K. Tochigi. Computer-aided reverse design Estimation of pure component properties. Part
for ionic liquids by QSPR using descriptors of 4: Estimation of the saturated liquid viscosity
group contribution type for ionic conductivities of non-electrolyte organic compounds via group
and viscosities. Fluid Phase Equilibria, 261: contributions and group interactions. Fluid
434443, 2007. Phase Equilibria, 281:97119, 2009.

[117] S. E. McLeese, J. C. Eslick, N. J. Hoffmann, [126] L. Y. Ng, F. K. Chong, and N. G. Chemman-


A. M. Scurto, and K. V. Camarda. Design gattuvalappil. Challenges and opportunities in
of ionic liquids via computational molecular computer-aided molecular design. Computers
design. Computers & Chemical Engineering, & Chemical Engineering, 81:115129, 2015.
34:14761480, 2010.
[127] O. Odele and S. Macchietto. Computer aided
[118] A. Mercader, E. A. Castro, and A. A. molecular design: A novel method for optimal
Toropov. QSPR modeling of the enthalpy solvent selection. Fluid Phase Equilibria, 82:
of formation from elements by means of 4754, 1993.
correlation weighting of local invariants of
atomic orbital molecular graphs. Chemical [128] J. E. Ourique and A. S. Telles. Computer-aided
Physics Letters, 330:612623, 2000. molecular design with simulated annealing and
molecular graphs. Computers & Chemical
[119] T. Mu, J. Rarey, and J. Gmehling. Group Engineering, 22:S615S618, 1998.
contribution prediction of surface charge
density profiles for COSMO-RS(Ol). AIChE [129] A. I. Papadopoulos and P. Linke. A unified
Journal, 53:32313240, 2007. framework for integrated process and molecular
design. Chemical Engineering Research and
[120] T. Mu, J. Rarey, and J. Gmehling. Group Design, 83:674678, 2005.
contribution prediction of surface charge
density distribution of molecules for COSMO- [130] A. I. Papadopoulos and P. Linke. Multiob-
SAC. AIChE Journal, 55:32983300, 2009. jective molecular design for integrated process-
solvent systems synthesis. AIChE Journal, 52:
[121] W. J. Murray, L. H. Hall, and L. B. Kier. 10571070, 2006.
Molecular connectivity III: Relationship to par-
tition coefficients. Journal of Pharmaceutical [131] A. I. Papadopoulos and P. Linke. Efficient
Sciences, 64:19781981, 1975. integration of optimal solvent and process
design using molecular clustering. Chemical
[122] Y. Nannoolal, J. Rarey, D. Ramjugernath, Engineering Science, 61:63166336, 2006.
and W. Cordes. Estimation of pure
component properties: Part 1. Estimation of [132] A. I. Papadopoulos, M. Stijepovic, and
the normal boiling point of non-electrolyte P. Linke. On the systematic design and
organic compounds via group contributions selection of optimal working fluids for organic
and group interactions. Fluid Phase Equilibria, Rankine cycles. Applied Thermal Engineering,
226:4563, 2004. 30:760 769, 2010.
CAMD Review 36

[133] V. Papaioannou, T. Lafitte, C. Avendano, [143] M. Randic and J. Zupan. On interpretation


C. S. Adjiman, G. Jackson, E. A. Muller, and of well-known topological indices. Journal of
A. Galindo. Group contribution methodology Chemical Information and Computer Sciences,
based on the statistical associating fluid theory 41:550560, 2001.
for heteronuclear molecules formed from Mie
segments. The Journal of chemical physics, [144] G. N. Roganov, P. N. Pisarev, V. N.
140:054107, 2014. Emelyanenko, and S. P. Verevkin. Mea-
surement and prediction of thermochemical
[134] J. R. Partington. A history of chemistry. 1970. properties. Improved Benson-type increments
for the estimation of enthalpies of vaporization
[135] S. J. Patel, D. Ng, and M. S. Mannan. and standard enthalpies of formation of
QSPR flash point prediction of solvents aliphatic alcohols. Journal of Chemical &
using topological indices for application in Engineering Data, 50:11141124, 2005.
computer aided molecular design. Industrial
& Engineering Chemistry Research, 48:7378 [145] K. Rose, L. H. Hall, and L. B. Kier. Modeling
7387, 2009. blood-brain barrier partitioning using the
electrotopological state. Journal of Chemical
[136] N. Pavurala and L. E. K. Achenie. A Information and Computer Sciences, 42:651
mechanistic approach for modeling oral drug 666, 2002.
delivery. Computers & Chemical Engineering,
57:196206, 2013. [146] N. V. Sahinidis, M. Tawarmalani, and M. Yu.
Design of alternative refrigerants via global
[137] Y. Peng, K. D. Goff, M. C. dos Ramos, optimization. AIChE Journal, 49:17611775,
and C. McCabe. Developing a predictive 2003.
group-contribution-based SAFT-VR equation
of state. Fluid Phase Equilibria, 277:131144, [147] A. Samudra and N. V. Sahinidis. Optimization-
2009. based framework for computer-aided molecular
design. AIChE Journal, 59:36863701, 2013.
[138] F.E. Pereira, E. Keskes, A. Galindo, G. Jack-
son, and C.S. Adjiman. Integrated solvent and [148] A. Samudra and N. V. Sahinidis. Design
process design using a SAFT-VR thermody- of heat transfer media components for retail
namic description: High-pressure separation of food refrigeration. Industrial & Engineering
carbon dioxide and methane. Computers & Chemistry Research, 52:85188526, 2013.
Chemical Engineering, 35:474491, 2011.
[149] S. R. S. Sastri and K. K. Rao. A new group
[139] E. N. Pistikopoulos and S. K. Stefanis. contribution method for predicting viscosity
Optimal solvent design for environmental of organic liquids. The Chemical Engineering
impact minimization. Computers & Chemical Journal, 50:925, 1992.
Engineering, 22:717733, 1998.
[150] J. Scheffczyk, L. Fleitmann, A. Schwarz,
[140] J. A. Platts, M. H. Abraham, D. Butina, M. Lampe, A. Bardow, and K. Leon-
and A. Hersey. Estimation of molecular hard. COSMO-CAMD: A framework for
linear free energy relationship descriptors by optimization-based computer-aided molecular
a group contribution approach. 2. Prediction design using COSMO-RS. Chemical Engineer-
of partition coefficients. Journal of Chemical ing Science, 2016.
Information and Computer Sciences, 40:7180,
2000. [151] J. A. Schramke, S. F. Murphy, W. J. Doucette,
and W. D. Hintze. Prediction of aqueous
[141] V. S. Raman and C. D. Maranas. Optimization diffusion coefficients for organic compounds at
in product design with properties correlated 25 C. Chemosphere, 38:23812406, 1999.
with topological indices. Computers &
Chemical Engineering, 22:747763, 1998. [152] S. Siddhaye, K. V. Camarda, E. Topp, and
M. Southard. Design of novel pharmaceuti-
[142] M. Randic. Characterization of molecular cal products via combinatorial optimization.
branching. Journal of the American Chemical Computers & Chemical Engineering, 24:701
Society, 97:66096615, 1975. 704, 2000.
Austin, Sahinidis, and Trahan 37

[153] S. Siddhaye, K. Camarda, M. Southard, and algorithm. Journal of Chemical Information


E. Topp. Pharmaceutical product design using and Computer Sciences, 35:188195, 1995.
combinatorial optimization. Computers &
Chemical Engineering, 28:425434, 2004. [163] D. P. Visco, R. S. Pophale, M. D. Rintoul,
and J.-L. Faulon. Developing a methodology
[154] M. Sinha, L. E. K. Achenie, and G. M. for an inverse quantitative structure-activity
Ostrovsky. Environmentally benign solvent relationship using the signature molecular
design by global optimization. Computers & descriptor. Journal of Molecular Graphics and
Chemical Engineering, 23:13811394, 1999. Modelling, 20:429438, 2002.

[155] M. Sinha, L. E. K. Achenie, and R. Gani. [164] Y. Wang and L. E. K. Achenie. Computer aided
Blanket wash solvent blend design using solvent design for extractive fermentation.
interval analysis. Industrial & Engineering Fluid Phase Equilibria, 201:118, 2002.
Chemistry Research, 42:516527, 2003.
[165] D. C. Weis and D. P. Visco. Computer-aided
[156] M. Stavrou, M. Lampe, A. Bardow, and molecular design using the Signature molecular
J. Gross. Continuous molecular targeting descriptor: Application to solvent selection.
computer-aided molecular design (CoMT Computers & Chemical Engineering, 34:1018
CAMD) for simultaneous process and solvent 1029, 2010.
design for CO2 capture. Industrial &
Engineering Chemistry Research, 53:18029 [166] H. Wiener. Structural determination of paraffin
18041, 2014. boiling points. Journal of the American
Chemical Society, 69:1720, 1947.
[157] S. E. Stein and R. L. Brown. Estimation
of normal boiling points from group contribu- [167] X. Wu, C. Zhang, P. Goldberg, D. Cohen,
tions. Journal of Chemical Information and Y. Pan, T. Arpin, and O. Bar-Yosef. Early
Computer Sciences, 34:581587, 1994. pottery at 20,000 years ago in Xianrendong
Cave, China. Science, 336:16961700, 2012.
[158] H. Struebing, Z. Ganase, P. G. Karamertzanis,
E. Siougkrou, P. Haycock, P. M. Piccione, [168] W. Xu and U. M. Diwekar. Improved genetic
A. Armstrong, A. Galindo, and C. S. Adjiman. algorithms for deterministic optimization and
Computer-aided molecular design of solvents optimization under uncertainty. Part II. Sol-
for accelerated reaction kinetics. Nature vent selection under uncertainty. Industrial
Chemistry, 5:952957, 2013. & Engineering Chemistry Research, 44:7138
7146, 2005.
[159] A. Tihic, G. M. Kontogeorgis, N. von Solms,
M. L. Michelsen, and L. Constantinou. A [169] X. Yao, B. Fan, J. P. Doucet, A. Panaye,
predictive group-contribution simplified PC- M. Liu, R. Zhang, X. Zhang, and Z. Hu.
SAFT equation of state: Application to Quantitative structure property relationship
polymer systems. Industrial & Engineering models for the prediction of liquid heat
Chemistry Research, 47:50925101, 2007. capacity. QSAR & Combinatorial Science, 22:
2948, 2003.
[160] R. Vaidyanathan and M. El-Halwagi.
Computer-aided synthesis of polymers and [170] L. Zhang, S. Cignitti, and R. Gani. Generic
blends with target properties. Industrial & mathematical programming formulation and
Engineering Chemistry Research, 35:627634, solution for computer-aided molecular design.
1996. Computers & Chemical Engineering, 78:7984,
2015.
[161] V. Venkatasubramanian, K. Chan, and J. M.
Caruthers. Computer-aided molecular design [171] T. Zhou, Z. Lyu, Z. Qi, and K. Sundmacher.
using genetic algorithms. Computers & Robust design of optimal solvents for chem-
Chemical Engineering, 18:833844, 1994. ical reactionsA combined experimental and
computational strategy. Chemical Engineering
[162] V. Venkatasubramanian, K. Chan, and J. M. Science, 137:613625, 2015.
Caruthers. Evolutionary design of molecules
with desired properties using the genetic
CAMD Review 38

[172] T. Zhou, J. Wang, K. McBride, and


K. Sundmacher. Optimal design of solvents for
extractive reaction processes. AIChE Journal,
2016.

[173] T. Zhou, Y. Zhou, and K. Sundmacher. A


hybrid stochasticdeterministic optimization
approach for integrated solvent and process
design. Chemical Engineering Science, 2016.

You might also like