Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Jour

nal
Name

A computational workflow to discover novel liquid or-


ganic hydrogen carriers and their dehydrogenation
routes
Kristin Paragian,∗a Bowen Li,a‡ , Morgan Massino and Srinivas Rangarajan a

Two-way liquid organic hydrogen carriers (LOHC) – organic molecules that store hydrogen as
reversible chemical bonds – is an emerging concept for on-demand storage and transportation
of hydrogen (and thereby energy). Given the large chemical universe, a plethora of potential
LOHCs exist, however, the optimal candidate depends on satisfying a variety of constraints on
physicochemical and thermochemical properties. Computational high-throughput screening of a
subspace of this universe can, in principle, reveal several LOHC candidates which can then be ex-
perimentally verified; however, to achieve this, the hydrogen rich and lean forms of the LOHC pair
have to be simultaneously identified based on a plausible connecting chemical pathway. Here,
using a combination of data-driven molecular property models and a cheminformatics-based re-
action network generation tool, viz. RING, we develop a novel computational workflow to identify
promising LOHC pairs (i.e. hydrogen-rich and hydrogen-lean forms) and the dehydrogenation
pathways connecting them. Starting from over 1 million small (containing less than 14 heavy
atoms) molecules in the PubChem database as seed, we applied this framework as proof of
concept to identify several LOHC pairs that have promising properties in terms of melting point,
boiling point, dehydrogenation enthalpy, hydrogen storage capacity, and synthetic accessibility;
we further analyze the thermochemistry of dehydrogenation pathways of the top five candidates.
We finally show that this screening provides a rich dataset that can be harnessed via supervised
learning algorithms to infer descriptive features that determine if a molecule is a good LOHC
candidate. We posit that the proposed workflow can be used to scalably analyze a much larger
molecular space and multiple classes of dehydrogenaton chemistries to discover novel LOHC
pairs.

1 Introduction
can be obtained from both methane reforming and water split-
New sources of energy such as methane from shale gas and ting (using renewable energy). However, storing pure hydrogen
renewable electrons are increasingly being utilized given their is difficult, requiring high pressure tanks (350 - 700 bar) or cryo-
abundance and relatively benign or positive impact on the en- genic temperatures due to its low boiling point (-252.8 ◦ C) 3 ; both
vironment compared to conventional carbon based fuels such as options are energy intensive, the cost of which could be as high
petroleum and coal. Hydrogen, with a gravimetric energy density as 40% of the total value 4 . Consequently, other technologies to
of 33.3 kWh/kg, has been a promising candidate for carrying en- store hydrogen, especially within another material (especially ad-
ergy with studies dating back to 1976 1 and can potentially play sorption on metals or metal-organic frameworks 5–9 ), has been of
an important role in the energy landscape of the future 2 as it substantial interest. In this context, the US Department of En-
ergy has specified desired specs for hydrogen storage for various
applications; for instance, for on-board storage of hydrogen in
a
Address, Address, Town, Country. Fax: XX XXXX XXXX; Tel: XX XXXX XXXX; E-mail: light-duty vehicles, a hydrogen storage capacity of 6.5 wt % (of
xxxx@aaa.bbb.ccc system weight) or higher is desired.
b
Address, Address, Town, Country.
† Electronic Supplementary Information (ESI) available: [details of any An emerging option for on-demand storage and transportation
supplementary information available should be included here]. See DOI: of hydrogen is liquid organic hydrogen carrier (LOHC) systems,
10.1039/cXCP00000x/
in which hydrogen is stored as reversible chemical bonds. In
‡ ∗ Work performed at Lehigh, currently a student at the Univ. of
Delawarea‡ ,Contributed equally,a Department of Chemical & Biomolecular Engineer- particular, hydrogen, obtained from methane reforming or water
ing, Lehigh University, Bethlehem PA electrolysis 10 can be added to a liquid organic (“hydrogen-lean")

J
our
nal
Name,
[yea
r][
,vol
.
], 1–13 | 1
molecule to produce a “hydrogen-rich" molecule which is also liq- molecules (and, therefore, the space of organic hydrogen carri-
uid. These LOHC pairs are called “two-way" carriers because the ers) within reasonable time and effort. Computational methods
hydrogen-rich molecule is stored and transported to the destina- allow us to rapidly screen large subsets of this space and poten-
tion whereat it is dehydrogenated to produce molecular hydro- tially identify undiscovered LOHC pairs. Computational screen-
gen; the “hydrogen-lean" molecule is then returned to the hydro- ing of molecules has been used in several fields, including the
gen source for subsequent cycling, as shown in Figure 1. There exploration of battery electrolyte solvents and organic photo-
are many advantages to LOHCs. They have: (1) moderate-to-high voltaics 30–33 , and although some computational works of overall
hydrogen storage capacities, (2) easily reversible reactions, (3) thermochemistry 34–36 or surface chemistry 37–39 have been pub-
the capacity for variable term (days to months) and scale (MWh lished in the context of LOHCs, large scale molecule screening of
to GWh of energy) energy storage at ambient conditions, (4) no potential candidates has not been carried out. This is, in part,
additional infrastructural needs and can use existing resources to because, to generate LOHC pairs, it is necessary to identify the
complete the entire cycle (i.e. use road, rail, or sea transporta- hydrogen-rich and the corresponding hydrogen-lean molecules
tion and tank-based storage because their properties are similar and ultimately the pathway connecting them making it impor-
to that of crude oil) 2,11–25 . Various research and demonstration tant to keep track of the chemistry. Existing molecule screening
projects are underway in multiple countries, including the United methods, however, do not keep track of such chemistry.
States, China, Germany, and South Africa 20 . Finally, technoe- In this work, we report a new screening workflow to identify
conomic analyses have shown that LOHCs are relatively cost ef- potential LOHCs that uses (i) a reaction network generator for
ficient to high-pressure hydrogen for long-term and large scale molecule pair and reaction pathway generation and (ii) a collec-
energy storage 26,27 and have transportation costs comparable to tion of existing and new data-driven molecular property estima-
that of crude oil 28 . tion tools. As an illustrative application, we then apply this work-
flow to identify synthetically feasible, small (containing fewer
than 14 non-hydrogen atoms), C/O/N/H-containing LOHC pairs
and further analyze the thermochemistry of the dehydrogenation
pathways of these promising ones.

2 Computational Methods
We begin with a description of different tools used here and sub-
sequently present our workflow.

2.1 Molecule generation using RING


LOHC pairs are obtained by pairing a fully hydrogenated
Fig. 1 A schematic of a two-way LOHC cycle. The hydrogen-rich LOHC
molecule with its corresponding fully dehydrogenated molecule.
(cyclohexane here) is produced upon hydrogenation and transported to
the destination where dehydrogenation takes place to produce molecular We obtain these pairs by starting with seed molecules and con-
hydrogen on-demand and the hydrogen-poor molecule (benzene here) is verting them into fully hydrogenated (i.e. saturated) and de-
recycled. hydrogenated forms using RING (Rule Input Network Genera-
tor) 40–42 .
Identifying a good LOHC pair is non-trivial because a num- RING (Figure 2) is a graph-theory based reaction network gen-
ber of physicochemical properties have to be satisfied by both eration and analysis tool. It accepts as input a set of reactants
molecules. Specifically, (i) both molecules need to be liquid and reaction rules written in an English-like language; the reac-
throughout the year under ambient conditions, (ii) the hydrogen- tion rules describe what transformations can occur (e.g. hydro-
rich molecule should have a high hydrogen storage capacity, (iii) genation or dehydrogenation). The reactants and the rules are
the hydrogenation/dehydrogenation enthalpy should be low to used to exhaustively generate the reaction network (a set of re-
moderate (often desired at around 15 kcal/mol or lower so that actions and the corresponding species); RING can subsequently
the overall thermochemistry is favorable at moderate 200◦ C or query this network for specific types of molecules, pathways, etc.
lower) temperatures, and (iv) the LOHCs need to be syntheti- RING has primarily been used for reaction network analysis, how-
cally feasible (and therefore inexpensive) and non-toxic (there- ever, we used it here as a molecule generator and pathway identi-
fore, easy to handle). Several tens of these systems (largely fier. For instance, specifying cyclohexadiene and hydrogen as in-
comprising of C,N,O, H, and B containing compounds) have al- put reactants into RING along with a reaction rule corresponding
ready been identified in the literature through experiments, and to hydrogenation would generate cyclohexene and cyclohexane
have been researched to determine if they are suitable options, (and the associated hydrogenation reactions). This hydrogena-
including cyclohexane/benzene, methylcyclohexane/toluene, de- tion network can be queried for saturated molecules to selectively
calin/naphthalene, N-ethylcarbazole/perhydro-N-ethylcarbazole, pick cyclohexane; this compound can then be input to RING as
boron-nitrogen heterocycles, etc. 16,19,20,29 . While experiments the reactant along with a dehydrogenation rule to generate a re-
are a reliable way of identifying new LOHCs, they are too action network comprising of intermediates such as cyclohexene,
slow and resource-intensive to cover the vast space of organic cyclohexadienes, and finally benzene. Subsequently, queries for

2| J
our
nal
Name,
[yea
r][
,vol
.
], 1–13
the fully dehydrogenated molecules in the network and pathways gathered and concatenated into a uniform vector, the contribu-
to them will identify benzene as the molecule and a dehydro- tion of each group can be acquired through regression. Since
genation pathway connecting it with cyclohexane. This process pathway fingerprints are exhaustively generated and collected for
would, thus, ultimately identify the hydrogen-rich and hydrogen- each molecule in the dataset, the total number of groups is huge
lean forms relative to a starting molecule (e.g. cyclohexadiene). (several thousands). Simple linear regression may, therefore, lead
to potential overfitting. To prevent the scenario, we use Least Ab-
solute Shrinkage and Selection Operator (LASSO) algorithm to
introduce a regularization term to shrink the contribution of each
group. The larger the value of λ , the regularization penalty, fewer
the number of groups selected. By choosing an optimized λ value,
Fig. 2 Schematic of Rule Input Network Generator (RING). sparse and relevant groups can be selected and the contribution
of the insignificant groups are pushed to zero. The contribution
for each group is then collected to build the machine learning
model for ∆H f predictions of the LOHCs.
2.2 Boiling and melting point prediction
To determine if the LOHC pairs are both liquid, their boil- About 58,000 molecules from the QM9 database were selected
ing and melting points were computed using OPERA (OPEn as the training set by excluding the molecules with multiple rings
structure-activity-property Relationship App) 43,44 . OPERA uti- and those containing atoms other than C,O,N, and H. The path-
lizes QSAR/QSPR (Quantitative Structure Activity/Property Re- way fingerprints are generated by the OPENBABEL module and
lationship) modeling, which forms a mathematical relationship counted by the RDKit python module 49,50 . The LASSO algorithm
between the structure of a molecule and its physical properties, is carried out using the sckit-learn Elastic-net python module by
given that molecules with similar structures should behave in setting the L2-norm to zero. Further, two correction terms are
similar ways 45 . OPERA models were built using 13 PHYSPROP defined to describe the ring structure and multiple ring informa-
datasets, containing more than 40,000 chemicals 46 , many of tion for the pathway fingerprints, the detailed steps to include the
which contain functionalities similar to our LOHC candidates. correction are described in the SI of our previous work. 51
These datasets are publicly available through the EPI suite and are
used to predict environmental fates and physicochemical prop-
erties. Multitudes of properties can be predicted using OPERA, 2.4 Computing Synthetic Accessibility Scores
including melting and boiling points, which we used to predict
To compute the synthetic accessibility of the LOHC candidates,
whether the fully hydrogenated molecules will be liquid. The R2
a Python code made available by Ertl and Schuffenhauer 52,53
values for these models vary between 0.71 - 0.96; it can there-
was used. This code also uses RDKit, and was modified slightly
fore be concluded that the melting and boiling point models are
to allow the calculations to be computed on an input file of
in reasonable agreement with experimental data.
SMILES strings. This method involves the addition of three sep-
arate scores, which account for fragment contributions and dif-
2.3 Reaction enthalpy prediction ferent attributes of complexity within the molecules. The Pub-
Another property to consider is the reaction enthalpy. In princi- Chem database was used to calculate the fragment contributions.
ple, the thermochemical property of the LOHC can be obtained Complexity attributes include the number of spiro atoms (atoms
through quantum chemistry calculations or experiments. How- connecting two rings that share one atom), bridgehead atoms
ever, given the size of the potential LOHC space, those methods (atoms connecting two rings that share two atoms), chiral cen-
are impractical due to their costs. Machine learning methods pro- ters, macrocycles, and number of atoms.
vide a computationally tractable way to obtain the energies of
Scores were transformed from their raw values to a scale of 1
the LOHCs and their reactions. By utilizing the available data ac-
(easiest) to 10 (most difficult). Ertl and Shuffenhauer indicate
quired from quantum chemistry or experiments to build a training
that a score of 6 is a rough cutoff between easy and difficult syn-
model, the properties of the remaining space can then be pre-
thesis. Therefore, any pairs that have a hydrogenated molecule
dicted at a low computational cost. In principle, any method can
with a synthetic accessibility score less than 6 are considered to
be used to compute the properties. We developed and employed
be ideal. Compared with synthetic accessibility values which were
a sparse group-additive model for the heat of formation trained
manually determined by several chemists, the model has an R2 =
on the recently published ab initio calculations of the enthalpy
0.89, making it reliable.
of formation (∆H f ) based on the G4MP2 theory for the QM9
database 47,48 consisting of 134,000 small organic molecules with
the total number of heavy atoms (C, N, O, F) up to 9.
2.5 Workflow to screen LOHC candidates
In this work, the groups are defined from the pathway finger-
prints which enumerate the path of the molecular graph of length The proposed workflow comprises of multiple stages – genera-
from 1 to 7, wherein length 1 represents each atom, length 2 tion of hydrogenated molecules, generation of dehydrogenated
represents various two-atom bonds and length 7 mostly repre- molecules, and final property-based screening to identify the top
sents a chain of atoms. After all unique pathway fingerprints are candidates– all shown in Figure 3.

J
our
nal
Name,
[yea
r][
,vol
.
], 1–13 | 3
Fig. 3 A detailed schematic of the workflow used to obtain the LOHC candidates. The three stages are clearly identified.

2.5.1 Stage 1: Generation of saturated LOHC candidates the molecule can undergo no more further dehydrogenation) can
be queried. The melting and boiling points for these molecules
This stage begins with creating a set of seed molecules, which can,
can then again be calculated using OPERA and the same screen-
for instance, be done by querying molecules of desired size from a
ing criteria as Stage 1 can be used to identify highly likely liquid
molecule database such as PubChem 54 . The process of extracting
hydrogen-lean molecules. This stage is summarized in Figure 3.
the molecules from PubChem to obtain the fully hydrogenated
and liquid molecules is summarized in stage 1 of Figure 3. A 2.5.3 Stage 3: Obtaining promising LOHC pairs
series of additional screenings needs to be performed on these The molecules at the end of Stage 2 are matched with their hydro-
molecules to make their SMILES strings compatible with RING genated counterparts by extracting reaction pathways connecting
(see Section 3 for specific checks done for PubChem). them and taking their endpoints. To do this, we use the pathway
These molecules then need to be processed to identity their query tool in RING that can automatically traverse the network
fully hydrogenated version. The reaction generator RING is used from the product to the input reactants. Identifying the pairs al-
for this purpose and a set of hydrogenation rules is fed to RING lows for calculating the (i) hydrogen storage capacity, (ii) abso-
along with these molecules. The specific rules used here are given lute value of the heat of dehydrogenation using our data-driven
in Section 3. model for computing the enthalpy of formation, and (iii) the syn-
Following hydrogenation, the first half of the LOHC pairs have thetic accessibility scores for each of the hydrogen-rich molecules,
been identified, but they must be further checked to see if they to characterize the space of feasible LOHC pairs. Further screen-
will be liquid. This next step uses OPERA to eliminate all ing of these pairs based on targeted values will result in the iden-
molecules with unfavorable boiling and melting points (the value tification of most promising LOHCs. This workflow is shown in
of which is set by the user). stage 3 of Figure 3.
2.5.2 Stage 2: Generation of the hydrogen-lean pair 2.5.4 Code availability
The fully hydrogenated molecules obtained at the end of Stage 1 The primary set of codes used in this work, viz. RDKit 49 , RING,
is now input into RING (along with one or more rules for dehy- and OPERA 44 are freely available. An earlier version of RING
drogenation). RING generates a dehydrogenation network from is freely available 55 , while newer versions of it can be obtained
which the fully dehydrogenated molecules (i.e. fully unsaturated, from the corresponding author upon reasonable request.

4| J
our
nal
Name,
[yea
r][
,vol
.
], 1–13
3 Results and Discussion Table 1 Number of LOHC pairs with boiling point > 70 ◦ C satisfying dif-
ferent ranges of hydrogen capacity (rows, wt %) and mean absolute heat
Molecules with 1–10 carbon atoms, 0–2 oxygen atoms, 0–2 ni- of hydrogenation (columns, kcal/mol-H2 ) values.
trogen atoms, and any number of hydrogen atoms (and with
≤ 25 ≤ 20 ≤ 15 ≤ 10
no other elements), totalling about 1.17 million molecules were
≥ 5.0 wt% 1548 247 49 14
queried and downloaded from the PubChem database. While
≥ 5.5 wt% 788 129 29 11
larger LOHCs have been proposed in the literature and this serves ≥ 6.0 wt% 225 60 12 2
as an illustrative application, this space is sizeable enough to ≥ 6.5 wt% 40 15 3 0
identify potential small molecule LOHC pairs. This list was fur- ≥ 7.0 wt% 37 15 3 0
ther cleaned up: several molecules in databases were in 3D and
needed to be converted to their 2D forms and entries containing
ion pairs, isotopes, or multiple molecules were removed. Over
10 ◦ C and boiling point greater than 70 ◦ C. These 14,000 pairs
one million molecules still remained after this. Hydrogenation
are therefore all potential LOHC pairs; it should be noted here
rules for RING in stage 1 included two standard rules to hydro-
that the unique fully hydrogenated molecules in these pair were
genate any double or triple bond and multiple rules to hydro-
∼ 7000. Using PubChemPy 56 , a Python wrapper that allows users
genate aromatic compounds (with different rules designed for
to search the PubChem database using Python, it was found that
different sizes of rings and for rings with different components,
all fully hydrogenated molecules are in the PubChem database.
including rings that contain oxygen and nitrogen in various lo-
Figure 4 shows the distribution of this final space across multiple
cations). Multiple rules were needed for aromatic compounds to
properties. We term this collection of 14k molecules as feasible
essentially ensure that bonds were correctly redistributed upon
LOHCs in that they satisfy the requirement that the correspond-
breaking the aromaticity. In addition, “global” constraints were
ing molecule pairs are both liquid under ambient conditions.
set to prevent the generation of consecutive double bonds or any
To obtain the most promising pairs, we subsequently screened
aromatic compounds with less than 5 heavy atoms. For dehy-
this feasible list for different target values of hydrogen storage
drogenation in stage 2, only one rule was needed which essen-
capacity (by weight), mean absolute heat of hydrogenation per
tially broke adjacent X-H single bonds (X is a heavy atom) to
hydrogen, and synthetic accessibility, We can see from Figures
form molecular hydrogen and a more unsaturated bond (single
4a, 4b, and 4c that most molecules have a low percent hydro-
becomes double, double becomes triple). RING automatically
gen by weight and a high synthetic accessibility score, indicating
identifies if a molecule is aromatic and redefines the bond type
that most of the feasible LOHC pairs are not ideal; heats of hy-
accordingly. Our property-based screening criteria in stages 1 and
drogenation, however, tend to be relatively low, with thousands
2 were that the predicted melting point be less than -10 ◦ C and
of LOHC pairs within 25 kcal/mol. We also see in Figure 4d that
the predicted boiling point be greater than 70 ◦ C to be consid-
the molecular weights of these pairs are mostly around 140 – 180
ered likely liquids. We use these constraints to ensure that within
g/mol.
the errors of the OPERA model we can conservatively preserve
The melting points of the hydrogenated and dehydrogenated
liquid candidates, i.e., we expect that the result of this stage of
forms show an almost identical distribution, with the hydro-
the workflow is a set of fully hydrogenated molecules with a high
genated molecules having slightly more molecules with a higher
likelihood of being liquid.
melting point, as seen in Figures 4e and 4g. They also fall roughly
Following the hydrogenation of all molecules from the Pub- within the same range. Comparing Figures 4f and 4h, however,
Chem database, about 305,000 unique fully hydrogenated shows that the boiling points of the dehydrogenated molecules
molecules were included in the output; it should be noted that show a higher range by 30 ◦ C. Both distributions show that the
several molecules from PubChem lead to the same fully hydro- majority of molecules have a boiling point between ∼ 170 ◦ C and
genated molecule. These molecules were then given as an input ∼ 220 ◦ C.
to OPERA and the molecules were screened based on their com- Figure 5 is a scatter plot of hydrogen capacity and mean ab-
puted melting and boiling points. This screening resulted in about solute heat of hydrogenation of feasible LOHC pairs. Desirable
53,000 fully hydrogenated molecules which were expected to be LOHCs in general would lie in the top-left portion of this plot.
liquid. Several known LOHCs were identified in this list, four of these
The screened fully hydrogenated molecules were given as an LOHCs (cyclohexane, piperidine, methylcyclohexane, and pyrro-
input to the dehydrogenation rules, and this generated over lidine) are pointed out in Figure 5. Based on this plot, Table 1
60,000 fully dehydrogenated molecules. Some fully hydro- shows the number of LOHC candidates with boiling point ≥ 70◦ C
genated molecules can be dehydrogenated in more than one way. and various property ranges in terms of heat of hydrogenation
This yields a larger number of fully dehydrogenated molecules, per hydrogen (≤ 10, 15, 20, 25 kcal/mol-H2 ) and hydrogen capac-
where several are very similar in structure, but not identical. ity (≥ 5.0, 5.5, 6, 6.5, 7 wt%).
These fully dehydrogenated molecules were given as input to There are no candidates in our list with hydrogen capacity
OPERA. After screening through their melting and boiling points greater than 7 wt% and heat of hydrogenation less than 10
and removing any repeat LOHC pairs, about 14,000 pairs re- kcal/mol-H2 . Relaxing the constraint on heat of hydrogenation
mained which contain both fully hydrogenated and fully dehy- results in 37 pairs for a heat less than 25 kcal/mol-H2 , while
drogenated molecules with predicted melting point less than - relaxing the constraint on hydrogen capacity results in 14 pairs

J
our
nal
Name,
[yea
r][
,vol
.
], 1–13 | 5
4000
103
Number of Molecules

Number of Molecules
3000
102
2000
101
1000

100
0
0-1% 1-2% 2-3% 3-4% 4-5% 5-6% 6-7% 7-8% 0-1 1-2 2-3 3-4 4-5 5-6 6-7 7-8 8-9
Percent Hydrogen by Weight Synthetic Accessibility Scores
(a) Percent Hydrogen by Weight (b) Synthetic Accessibility Scores

104
4000
3500
Number of Molecules

Number of Molecules

103 3000
2500
2000
102 1500
1000
500
101 0
0-10 10-20 20-30 30-40 40-50 60-90 90-120 120-150 150-180 >180
Heat of Hydrogenation (kcal/mol H2) Molecular Weight (g/mol)
(c) Heats of Hydrogenation (d) Molecular Weight

3000
3000
2500
Number of Molecules

Number of Molecules

2500
2000
2000
1500
1500

1000 1000

500 500

0 0
<-110 -110 - -85 -85 - -60 -60 - -35 -35 - -10 70-100 100-130 130-160 160-190 190-220 >220
Melting Point (C) Boiling Point (C)
(e) Melting Points of hydrogenated molecules (f) Boiling Points of hydrogenated molecules

6| J
our
nal
Name,
[yea
r][
,vol
.
], 1–13
7000 7000
6000 6000
Number of Molecules

Number of Molecules
5000 5000
4000 4000
3000 3000
2000 2000
1000 1000
0 0
<-110 -110 - -85 -85 - -60 -60 - -35 -35 - -10 70-105 105-140 140-175 175-210 210-245 >245
Melting Point (C) Boiling Point (C)
(g) Melting Point of dehydrogenated molecules (h) Boiling Point of dehydrogenated molecules

Fig. 4 Histograms of all Potential LOHCs. Ranges are exclusive of lower limits and inclusive of upper limits. Figures a and c were created from 14k
unique LOHC pairs, 6 and e-f from 7k unique hydrogenated molecules, and 6g and h from 14k unique dehydrogenated molecules.

for a capacity greater than 5.0 wt%. As reference, methylcy- however, some information available for 4-ethylpiperidine (boil-
clohexane has a hydrogen capacity of than 6.1h wt% and mean ing point of 157 ◦ C) 57 and its pair, 4-vinylpyridine (melting point
heat of hydrogenation of 16.34 kcal/mol-H2 (although it has a < 25 ◦ C and boiling point 188–192 ◦ C) 57 . The pathways shown
low boiling point of 101 ◦ C and a melting point of -126 ◦ C); in Figure 7 were identified by querying all shortest pathways be-
toluene has a boiling point of 110 ◦ C and a melting point of -95 tween the respective LOHC molecule pairs (Pi ), calculating the
◦ C. The desired property range is dependent on the application; reaction enthalpy (∆Hrxn−i, j ) for each step j in each pathway
for instance, a high hydrogen capacity (of ≥ 6.5wt%) is required Pi , tracking the step with the highest reaction enthalpy value
for vehicular on-board storage, while other applications (such as (∆Hmax,Pi ) in each pathway, and ultimately picking the pathway
long-term large scale energy storage), lower hydrogen capacity with the smallest ∆Hmax . The dehydrogenation enthalpy (at gas
and higher dehydrogenation/hydrogenation enthalpy can be tol- phase standard state conditions) of each step of the pathways
erated in view of synthetic feasibility (and, thereby, cost). For are also shown in Figure 7. It can be noted that even though
further analysis here, we consider those 37 pairs that satisfy (i) the mean absolute heat of hydrogenation per hydrogen for these
hydrogen storage capacity greater than 7% by weight, (ii) the ab- LOHCs is less than or equal to 20.5 kcal/mol-H2 , the individual
solute value of the heat of hydrogenation per hydrogen of less reactions steps of the dehydrogenation pathway can have larger
than 25 kcal/mol, and further rank order them in terms of syn- absolute enthalpies. Most dehydrogenation steps are endother-
thetic accessibility score to identify molecules with scores of less mic and within 25 kcal/mol, although a few steps, in particular
than or equal to 6 (which indicates easy to moderate difficulty in exocyclic dehydrogenation, have higher reaction enthalpies. It
synthesis). We have, however, provided the property details of can further be noted that C-N bond dehydrogenation enthalpies
each of the approximately 14,000 LOHC pairs in the supporting (exocyclic or endocyclic) are generally lower than C-C dehydro-
information. genation enthalpies, indicating the favorable effect of N incorpo-
ration on the overall thermochemistry and the reason why all top
The 37 pairs were sorted by their synthetic accessibility scores
five candidates comprise N atoms.
from smallest to largest. Only five LOHC pairs had a synthetic
accessibility score less than 6, indicating that they will be easier
None of these five LOHCs have been reported in the litera-
to synthesize. We will discuss these five LOHC pairs in depth.
ture and are thus new LOHC pairs. As was shown in Figure 5,
Property information and suggested dehydrogenation path- our workflow was able to identify known LOHCs. Given that
ways (extracted from RING) for each of the five LOHC pairs two of these, cyclohexane and piperidine, were within the final
are included in Figures 6 and 7. The top five LOHCs, in set of LOHCs to be screened, the five new LOHC pairs should
terms of the hydrogen-rich form, are N-(cyclohexylmethyl) be given attention as promising hydrogen carriers. In particu-
ethanamine, (2-cyclohexylethyl)(methyl) amine, methyl[(4- lar, more experimental analyses are needed to identify catalysts
methylcyclohexyl) methyl] amine, 4-ethylpiperidine, and 4-ethyl- and operating conditions that allow for maximizing the hydro-
N-methylcyclohexan-1-amine. These are all known chemicals gen storage capacity. Identifying active and selective catalysts
that can be found in the PubChem database. Interestingly, they all can be particularly important to ensure selectivity when alterna-
contain N atoms, however, mostly as exocyclic substituents. Ex- tive dehydrogenation products are also likely. For instance, for
perimental data for these molecules is scarcely reported; there is, LOHC 3, an alternative hydrogen-lean pair includes N-Methyl-1-

J
our
nal
Name,
[yea
r][
,vol
.
], 1–13 | 7
8
N and O Containing
7 O Containing
Percent Hydrogen by Weight N Containing
6 Hydrocarbon
5
cyclohexane
4
piperidine
3
methylcyclohexane
2
1 pyrrolidine
0
0 10 20 30 40 50 60 70 80
Mean Absolute Heat of Hydrogenation (units of kcal/mol H2 evolved)
Fig. 5 Scatter Plot of Percent Hydrogen by Weight (hydrogen capacity) versus Heat of Hydrogenation for all LOHC Pairs meeting melting and boiling
point constraints. Data is color-coded to illustrate the distribution of several types of compounds. Four known LOHCs are also shown: cyclohexane,
piperidine, methylcyclohexane, and pyrrolidine. The boxed area represents the data that will be further screened.

(4-methylphenyl)methanimine which would result in the release Table 2 Number of LOHC pairs with boiling point > 190 ◦ C satisfying
different ranges of hydrogen capacity (rows, wt %) and mean absolute
of fewer hydrogens per molecule but with a much more favor-
heat of hydrogenation (columns, kcal/mol-H2 ) values.
able thermochemistry (∼ 16 kcal/mol-H2 ); the dehydrogenated
product shown in Figure 7(c) can also rearrange to form 4- ≤ 25 ≤ 20 ≤ 15 ≤ 10
methylbenzyl isocyanide. ≥ 5.0 wt% 540 47 4 2
≥ 5.5 wt% 261 4 0 0
While evaluating the activity and selectivity of catalysts for de- ≥ 6.0 wt% 46 2 0 0
hydrogenation of a given LOHC is beyond the scope of this work, ≥ 6.5 wt% 1 0 0 0
we note that heterogeneous catalytic dehydrogenation (and hy- ≥ 7.0 wt% 0 0 0 0
drogenation) of such compounds typically occurs at temperatures
of 300 ◦ C 58 to achieve a sufficient activity which leads to two
other considerations that one needs to keep in mind. First, this
temperature may be insufficient for dehydrogenating some of the
aforementioned exocyclic bonds, thus reducing the overall hydro- than 15 kcal/mol-H2 . Interestingly, molecules such as perhydroin-
gen capacity. Second, even under these typical conditions, the dole and its N-ethylated analog, which are closer to some of the
LOHC candidates in Figure 6 can become vapor which is also the well known examples of LOHCs (viz. carbazole derivatives) were
issue with cyclohexane. Both these issues can be addressed using identified in the fully hydrogenated list of molecules generated
lower temperature processes such as homogeneous catalysis 59 , by RING in stage 1 of our workflow; however, they were screened
electrocatalysis 60,61 , or transfer hydrogenation 62 . Indeed, the out because of a high melting point. Larger LOHC molecules such
last approach has shown that dehydrogenation can be brought as dibenzyl toluene were not identified because they are larger
down to 190 ◦ C for large organic molecules. To check if there than our initial seed molecules although our workflow can easily
are promising molecules in our list that are liquid at this temper- be employed to search a much bigger space of potentially larger
ature, we queried for molecules in Table 1 with a boiling point molecules in a future endeavor. We finally note that some of the
constraint of ≥ 190◦ C; the results of this are given in Table 2. molecules in these five pairs, e.g. 4-vinylpyridine formed from
There is only one molecule satisfying any of our heat of hydro- the dehydrogentaion of 4-ethylpiperidine is unstable and requires
genation constraints and having more than 6.5 wt% hydrogen addition of ppm quantities of stabilizers; such additions may not
capacity; relaxing the constraint on hydrogen capacity results in drastically affect the physical properties or hydrogen storage ca-
4 molecules with more than 5.0 wt% hydrogen capacity and less pacity but may influence dehydrogenation activity.

8| J
our
nal
Name,
[yea
r][
,vol
.
], 1–13
(a) Data for LOHC 1. (b) Data for LOHC 2.

(c) Data for LOHC 3. (d) Data for LOHC 4.

(e) Data for LOHC 5.

Fig. 6 Top 5 LOHC pairs and their data, in order of most synthetically feasible to least. Included are predictions of melting points, boiling points, and
heats of hydrogenation, as well as calculated molecular weights, percent hydrogen by weight, and synthetic accessibility scores. Also included are CID
numbers of the hydrogenated molecules, as well as their names in PubChem’s database.

3.1 Qualitative descriptors of good LOHC candidates hydrogen-rich LOHCs, therefore, this set can be used to provide
The results from the molecule screening analysis allows us to ad- more reliable data-driven insights. This classification is still valu-
dress the following question: what makes a molecule a good LOHC able as it is the first step in determining whether an LOHC has
candidate? We can consider this question by seeking interpretable potential or not and can be used to quickly screen larger molecule
data-driven models, trained on our screening results, that can cor- sets.
rectly classify a molecule as a promising LOHC candidate or not LDA works by projecting the high-dimension data onto the
and thereby identify molecular features that are essential in de- low dimension space where the data are optimally separated.
termining this characteristic. However, since only 37 molecules The projection is conducted by identifying a transformation ma-
pass all our constraints (while we considered over 300,000 fully trix G that minimizes the within-class distance while maximizing
hydrogenated molecules), a classification model cannot be built the between-class distance. 63 Since we consider only two classes
on this small set. Instead, we built a linear discriminant analy- here, the transformation matrix is per the original Fisher Linear
sis (LDA) model to classify feasible and infeasible LOHCs where Discriminant analysis (FLDA):
feasibility here implies that both molecules of an LOHC pair are
liquid under ambient conditions. There are over 7000 feasible GF = St+ (c(1) − c(2) ) (1)

J
our
nal
Name,
[yea
r][
,vol
.
], 1–13 | 9
(a) Dehydrogenation pathway for LOHC 1 (b) Dehydrogenation pathway for LOHC 2

(c) Dehydrogenation pathway for LOHC 3 (d) Dehydrogenation pathway for LOHC 4

(e) Dehydrogenation pathway for LOHC 5

Fig. 7 Possible dehydrogenation pathways for top five LOHC pairs, including the heat of reaction for each step of dehydrogenation. Each step evolves
a hydrogen molecule, which is not shown for simplicity

where c(1) and c(2) corresponded to the centroids of the two tively generated pathway fingerprints, however, even though the
classes. Note here the transformation matrix GF is of rank one feature dimension is reduced to one, the number of features is too
and has the same dimension as the features, and the prediction large to interpret the model. In this work, we followed a heuristic
from the LDA model for each sample is the sign (positive or neg- process of feature selection by iteratively eliminating unwanted
ative) of the dot product of the first vector of GF and the feature features while preserving those that are essential for predictive
vector x of the molecule (plus the intercept). Therefore, we can performance. Specifically, we set a threshold to keep the features
interpret the value of the corresponding element in GF as the con- or not in the model either based on their contribution or the fre-
tribution of each feature towards the prediction; larger the value, quency with which the feature occurs in the dataset. After each
more important that feature is. round of the feature elimination, the model is retrained with the
fewer remaining features to obtain their updated contribution in
In this work, the LDA method is carried out through the Scikit-
the transformation vector. The process is conducted iteratively
learn python module 64 with the least square solver and automatic
until a model with sparse enough features for interpretability was
Ledoit-wolf shrinkage method 65 being applied. Following the sec-
obtained. The threshold for the contribution or the frequency
tion in the enthalpy prediction, we use the pathway fingerprints
was set heuristically such that the feature number shrinks while
as the features of each molecule but further extended these by
the performance drop in the test set is minimal.
distinguishing between whether a fragment is part of an acyclic
portion (A) or partially (P)/completely (R) inside a ring struc- In this work, we took the fully hydrogenated LOHCs which sat-
ture. For example, for the CC group in ethylcyclohexane, there isfies the boiling/melting point metric or not as the LDA classifi-
is 1 CC (A), 1 CC (P) and 6 CC (R) groups. These modifications cation case study problem.
approximately tripled the original number of the features.
Out of the 300k candidates, ∼ 7k hydrogen-rich LOHCs iden-
One of the benefits of the LDA method is the dimensionality tified by the screening process were taken as the positive sam-
reduction it offers through the transformations. With the exhaus- ples while an equal number of molecules were sampled from the

10 | J
our
nal
Name,
[yea
r][
,vol
.
], 1–13
Table 3 The selected features of the final sparse LDA model. The tures barely occurred simultaneously as the correlation between
contribution represents the solution from the final LDA model, and the
each pair of linear O, N and ring N atoms is low. The fragment
frequency represents the percentage of the test molecules having the
atom/group. The letters A,R,P in the atom/group column bracket indi- COC, however, occurred frequently (and naturally highly corre-
cates whether the group occurs in an acyclic portion, within a ring, or lated with O). On the other hand, in the negative samples, the cor-
partly in a ring respectively. relations of the salient negative groups are high while the corre-
lation of linear O atom and linear COC group is low. In summary,
Atom/group Contribution Frequency while the LDA model does not specifically identify salient positive
N (A) -1.859 0.526 features (or fragments), it does identify the negative ones; that is,
N (R) -2.748 0.378
the model offers insights on what fragments are detrimental to a
O (A) -1.910 0.726
NC (P) 0.134 0.259 molecule from being a feasible LOHC candidate.
COC (A) 2.069 0.301 4 Conclusion
OCN (A) -0.669 0.093
OCN (P) -0.060 0.134 A novel computational molecule screening workflow, which is
OCCOC (A) 0.384 0.061 aware of the underlying hydrogenation/dehydrogenation chem-
NCCCOC (P) -0.628 0.059 istry and molecule synthesizability, was proposed and demon-
intercept 3.707 strated by identifying promising small-to-medium sized LOHC
pairs. The algorithm specifically used RING, a network generator,
to identify fully hydrogenated and (the corresponding) dehydro-
remaining 293k candidates and taken as the negative set. The genated molecules from seed candidates taken from the PubChem
dataset was further split in the ratio of 0.8/0.2 into the train- database and then to extract reaction pathways between these
ing/test set, while both the training set and test set have the same pairs. Various existing and new data-driven molecular property
proportion of positive and negative samples obtained through models were used to compute boiling point, melting point, en-
stratified sampling. After feeding the dataset into OPENBABEL, thalpy of formation, and synthetic accessibility score of molecules
638 unique pathway fingerprints were generated and then was identified in the process. The workflow started with over 1 million
extended to 1769 through the ring correction and ring splitting. molecules from the PubChem database, identified ∼ 14k LOHC
Directly feeding the entire feature matrix into the LDA model with pairs that were feasible, and ultimately generated 37 LOHC can-
shrinkage gave a model performance accuracy as 0.94 in the test didate pairs whose fully hydrogenated compounds (a) have a hy-
set (indicating that the classifier has a 94% accuracy in correctly drogen capacity ≥ 7 wt%, (b) have an enthalpy of hydrogena-
classifying a molecule as feasible or not). tion of less than 25 kcal/mol of H2 released, (c) are deemed to
While the performance of the model is remarkable, the number be moderately hard (to easy) to synthesize, and (d) are already
of features was too large to interpret which molecular fragments found in the PubChem database. A sparse linear discriminant
were essential. We therefore carried out an iterative process of analysis model was developed to identify salient LOHC features.
eliminating unimportant fragments that either occurred in less The results showed that the presence of multiple heteroatoms
than 5% of the molecules in the test set or had too small a contri- was detrimental (in particular both N and O), although the pres-
bution (≤ 0.5). The final model we obtained has only 9 features, ence of N atoms was found to reduce the dehydrogenation en-
while the test accuracy is 0.82 in the test set. The selected fea- thalpy in general. Our results show that a number of potential
tures and their contributions are shown in Table 3. LOHCs exist that are yet to be explored and our proposed com-
The prediction for the LOHC based on this LDA model came putational screening workflow offers a scalable way of screening
from the element-wise multiplication of the feature count and large molecular spaces in this context.
their contribution (plus intercept 1), and whether the result is Acknowledgements: This work was partially supported by
greater or less than zero. As shown in Table 3, the salient nega- startup funds and CORE research grants from Lehigh University.
tive features are the acyclic N atom, ring N atom, linear O atom, Supporting Information: The list of 14k feasible LOHCs and
and the salient positive features include the acyclic COC group; the final list of 37 pairs along with their molecular properties
the linear intercept itself was positive indicating the model was (boiling and melting point, number of hydrogens released, syn-
generally optimistic. Numerically speaking, if one of the salient thetic accessibility scores, heat of hydrogenation per hydrogen,
negative features occurred more than twice in a molecule, the and compound ID numbers of the fully hydrogenated molecule of
positive value of the intercept can be neutralized. However, the the pair).
absence of acyclic COC group does not render it infeasible. These
are seen in some of the best LOHCs shown in Figure 6. These
molecules (i) all contain an N atom in the ring at an exocyclic po-
sition, (ii) do not contain an oxygen atom, and therefore (iii) do
not contain a COC group; however, they only contain one N atom,
and our LDA molecule will correctly classify them as feasible.
Figure 8 shows the correlation between the final set of fea-
tures in the two sets with true positive or negative samples. We
can observe that in the positive samples, the salient negative fea-

J
our
nal
Name,
[yea
r][
,vol
.
], 1–13 | 11
Fig. 8 Correlation of the final set of features from Table 3 within positive (feasible LOHC candidates) and negative samples. The designation ‘A’ has
been dropped for brevity.

Notes and references 2017, 5, 4491–4498.


19 K. Müller, K. Stark, V. N. Emel’yanenko, M. A. Varfolomeev,
1 L. Jones, Miami Univ. First World Hydrogen Energy Conf. D. H. Zaitsau, E. Shoifet, C. Schick, S. P. Verevkin and W. Arlt,
Proc„ 1976. Industrial & Engineering Chemistry Research, 2015, 54, 7967–
2 P. Preuster, C. Papp and P. Wasserscheid, Accounts of Chemical 7976.
Research, 2017, 50, 74–85. 20 P. M. Modisha, C. N. Ouma, R. Garidzirai, P. Wasserscheid and
3 Office of Energy Efficiency and Renewable Energy, Hydrogen D. Bessarabov, Energy & fuels, 2019, 33, 2778–2796.
Storage, https://www.energy.gov/eere/fuelcells/ 21 H. Jorschick, A. Bulgarin, L. Alletsee, P. Preuster, A. Bösmann
hydrogen-storage. and P. Wasserscheid, ACS sustainable chemistry & engineering,
4 M. L. Trudeau, MRS bulletin, 1999, 24, 23–26. 2019, 7, 4186–4194.
5 R. Zacharia et al., Journal of Nanomaterials, 2015, 2015, year. 22 M. Jang, Y. S. Jo, W. J. Lee, B. S. Shin, H. Sohn, H. Jeong, S. C.
Jang, S. K. Kwak, J. W. Kang and C. W. Yoon, ACS Sustainable
6 N. L. Rosi, J. Eckert, M. Eddaoudi, D. T. Vodak, J. Kim,
Chemistry & Engineering, 2018, 7, 1185–1194.
M. O’Keeffe and O. M. Yaghi, Science, 2003, 300, 1127–1129.
23 H. Zhong, M. Iguchi, M. Chatterjee, Y. Himeda, Q. Xu
7 A. Zaluska, L. Zaluski and J. Ström-Olsen, Journal of Alloys
and H. Kawanami, Advanced Sustainable Systems, 2018, 2,
and Compounds, 1999, 288, 217–225.
1700161.
8 B. Sakintuna, F. Lamari-Darkrim and M. Hirscher, Interna-
24 J. Eppinger and K.-W. Huang, ACS Energy Letters, 2017, 2,
tional journal of hydrogen energy, 2007, 32, 1121–1140.
188–195.
9 G. Principi, F. Agresti, A. Maddalena and S. L. Russo, Energy,
25 Z. Shao, Y. Li, C. Liu, W. Ai, S.-P. Luo and Q. Liu, Nature
2009, 34, 2087–2091.
communications, 2020, 11, 1–7.
10 J. A. Turner, Science, 2004, 305, 972–974.
26 M. Niermann, S. Drünert, M. Kaltschmitt and K. Bonhoff, En-
11 P. Preuster, A. Alekseev and P. Wasserscheid, Annual review of
ergy & Environmental Science, 2019, 12, 290–307.
chemical and biomolecular engineering, 2017, 8, 445–471.
27 P. T. Aakko-Saksa, C. Cook, J. Kiviaho and T. Repo, Journal of
12 J. Kothandaraman, S. Kar, R. Sen, A. Goeppert, G. A. Olah and
Power Sources, 2018, 396, 803–823.
G. S. Prakash, Journal of the American Chemical Society, 2017,
28 F. H. Saadi, N. S. Lewis and E. W. McFarland, Energy & Envi-
139, 2549–2552.
ronmental Science, 2018, 11, 469–475.
13 E. Gianotti, M. Taillades-Jacquin, J. Roziere and D. J. Jones,
29 P. G. Campbell, L. N. Zakharov, D. J. Grant, D. A. Dixon and
ACS Catalysis, 2018, 8, 4660–4680.
S.-Y. Liu, Journal of the American Chemical Society, 2010, 132,
14 A. Kumar, T. Janes, N. A. Espinosa-Jalapa and D. Milstein,
3289–3291.
Journal of the American Chemical Society, 2018, 140, 7453–
30 J. Hachmann, R. Olivares-Amaya, S. Atahan-Evrenk,
7457.
C. Amador-Bedolla, R. S. Sánchez-Carrera, A. Gold-Parker,
15 Y.-Q. Zou, N. von Wolff, A. Anaby, Y. Xie and D. Milstein, Na-
L. Vogt, A. M. Brockway and A. Aspuru-Guzik, The Journal
ture catalysis, 2019, 2, 415–422.
of Physical Chemistry Letters, 2011, 2, 2241–2251.
16 D. Teichmann, W. Arlt and P. Wasserscheid, International
31 J. Hachmann, R. Olivares-Amaya, A. Jinich, A. L. Apple-
Journal of Hydrogen Energy, 2012, 37, 18118–18132.
ton, M. A. Blood-Forsythe, L. R. Seress, C. Roman-Salgado,
17 T. He, Q. Pei and P. Chen, Journal of energy chemistry, 2015,
K. Trepte, S. Atahan-Evrenk, S. Er et al., Energy & Environ-
24, 587–594.
mental Science, 2014, 7, 698–704.
18 R. H. Crabtree, ACS Sustainable Chemistry and Engineering,

12 | J
our
nal
Name,
[yea
r][
,vol
.
], 1–13
32 T. Husch, N. D. Yilmazer, A. Balducci and M. Korth, Physical 56 PubChemPy documentation, https://pubchempy.
Chemistry Chemical Physics, 2015, 17, 3394–3401. readthedocs.io/en/latest/.
33 T. Husch and M. Korth, Physical Chemistry Chemical Physics, 57 ChemSpider, http://www.chemspider.com.
2015, 17, 22596–22603. 58 K. Müller, J. Völkl and W. Arlt, Energy Technology, 2013, 1,
34 C. M. Araujo, D. L. Simone, S. J. Konezny, A. Shim, R. H. 20–24.
Crabtree, G. L. Soloveichik and V. S. Batista, Energy & Envi- 59 A. Søgaard, M. Scheuermeyer, A. Bösmann, P. Wasserscheid
ronmental Science, 2012, 5, 9534–9542. and A. Riisager, Chemical communications, 2019, 55, 2046–
35 E. Clot, O. Eisenstein and R. H. Crabtree, Chemical communi- 2049.
cations, 2007, 2231–2233. 60 P. F. Driscoll, E. Deunf, L. Rubin, J. Arnold and J. B. Kerr,
36 V. N. Emel’yanenko, D. H. Zaitsau, A. A. Pimerzin and S. P. Journal of The Electrochemical Society, 2013, 160, G3152.
Verevkin, The Journal of Chemical Thermodynamics, 2019, 61 P. Hauenstein, D. Seeberger, P. Wasserscheid and S. Thiele,
132, 122–128. Electrochemistry Communications, 2020, 106786.
37 C. N. Ouma, P. M. Modisha and D. Bessarabov, Computational 62 G. Sievi, D. Geburtig, T. Skeledzic, A. Bösmann, P. Preuster,
Materials Science, 2020, 172, 109332. O. Brummel, F. Waidhas, M. A. Montero, P. Khanipour, I. Kat-
38 C. N. Ouma, P. M. Modisha and D. Bessarabov, Applied Surface sounaros et al., Energy & Environmental Science, 2019, 12,
Science, 2019, 471, 1034–1040. 2305–2314.
39 C. N. M. Ouma, P. Modisha and D. Bessarabov, RSC advances, 63 R. O. Duda, P. E. Hart and D. G. Stork, Pattern classification,
2018, 8, 31895–31904. John Wiley & Sons, 2012.
40 S. Rangarajan, A. Bhan and P. Daoutidis, Computers & chemi- 64 F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel,
cal engineering, 2012, 45, 114–123. B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss,
41 S. Rangarajan, A. Bhan and P. Daoutidis, Computers & chemi- V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau,
cal engineering, 2012, 46, 141–152. M. Brucher, M. Perrot and E. Duchesnay, Journal of Machine
42 S. Rangarajan, T. Kaminski, E. Van Wyk, A. Bhan and P. Daou- Learning Research, 2011, 12, 2825–2830.
tidis, Computers & Chemical Engineering, 2014, 64, 124–137. 65 O. Ledoit and M. Wolf, Journal of empirical finance, 2003, 10,
43 K. Mansouri, C. M. Grulke, R. S. Judson and A. J. Williams, 603–621.
Journal of cheminformatics, 2018, 10, 10.
44 K. Mansouri, OPERA, https://github.com/kmansouri/
OPERA, 2018.
45 K. Roy, S. Kar and R. N. Das, A primer on QSAR/QSPR model-
ing: fundamental concepts, Springer, 2015, pp. 1–36.
46 EPI-Suite, Estimation Program Interface, https:
//www.epa.gov/tsca-screening-tools/
epi-suitetm-estimation-program-interface.
47 R. Ramakrishnan, P. O. Dral, M. Rupp and O. A. Von Lilienfeld,
Scientific data, 2014, 1, 140022.
48 B. Narayanan, P. C. Redfern, R. S. Assary and L. A. Curtiss,
Chemical science, 2019, 10, 7449–7455.
49 An overview of the RDKit - The RDKit 2019.09.1 documen-
tation, https://rdkit.readthedocs.io/en/latest/
Overview.html{#}what-is-it.
50 G. Landrum, RDKit: Open-Source Cheminformatics Software,
https://www.rdkit.org/.
51 B. Li and S. Rangarajan, Molecular Systems Design & Engineer-
ing, 2019, 4, 1048–1057.
52 P. Ertl and A. Schuffenhauer, Journal of Cheminformatics,
2009, 1, 8.
53 P. Ertl and A. Schuffenhauer, SA_Score, https://github.
com/rdkit/rdkit/tree/master/Contrib/SA_Score,
2019.
54 PubChem, National Center for Biotechnology Information.
PubChem Compound Database, https://pubchem.ncbi.
nlm.nih.gov/.
55 P. D. S. Rangarajan, A. Bhan, RING, https://bhan.cems.
umn.edu/software, 2012.

J
our
nal
Name,
[yea
r][
,vol
.
], 1–13 | 13

You might also like