Predicting solvation free energies for neutral molecules in any solvent with openCOSMO-RS
Abstract
The accurate prediction of solvation free energies is critical for understanding various phenomena in the liquid phase, including reaction rates, equilibrium constants, activity coefficients, and partition coefficients. Despite extensive research, precise prediction of solvation free energies remains challenging. In this study, we introduce openCOSMO-RS 24a, an improved version of the open-source COSMO-RS model, capable of predicting solvation free energies alongside other liquid-phase properties. We parameterize openCOSMO-RS 24a using quantum chemical calculations from ORCA 6.0, leveraging a comprehensive dataset that includes solvation free energies, partition coefficients, and infinite dilution activity coefficients for various solutes and solvents at . Additionally, we develop a Quantitative Structure-Property Relationships model to predict molar volumes of the solvents, an essential requirement for predicting solvation free energies from structure alone. Our results show that openCOSMO-RS 24a achieves an average absolute deviation of for solvation free energies, for partition coefficients, and for infinite dilution activity coefficients, demonstrating improvements over the previous openCOSMO-RS 22 parameterization and comparable results to COSMOtherm 24 TZVP. A new command line interface for openCOSMO-RS 24a was developed which allows easy acces to the solvation energy model directly from within ORCA 6.0. This represents a significant advancement in the predictive modeling of solvation free energies and other solution-phase properties, providing researchers with a robust tool for applications in chemical and materials science.
AADshort=AAD, long=Average Absolute Deviation \DeclareAcronymRMSDshort=RMSD, long=Root Mean Square Deviation \DeclareAcronymMDshort=MD, long=Molecular Dynamics \DeclareAcronymGCshort=GC, long=Group Contribution \DeclareAcronymHBshort=HB, long=Hydrogen bond \DeclareAcronymQMshort=QM, long=Quantum Mechanics \DeclareAcronymQCshort=QC, long=Quantum chemical \DeclareAcronymCPCMshort=CPCM, long=Conductor-like polarizable continuum model \DeclareAcronymIDACshort=IDAC, long=infinite dilution activity coefficient \DeclareAcronymMLshort=ML, long=Machine Learning \DeclareAcronymD-MPNNshort=D-MPNN, long=Directed-Message Passing Neural Network \DeclareAcronymMPNNshort=D-MPNN, long=Message Passing Neural Network \DeclareAcronymGNNshort=GNN, long=Graphical Neural Network \DeclareAcronymRMSEshort=RMSE, long=Root-Mean-Square Error \DeclareAcronymQSPRshort=QSPR, long=Quantitative Structure-Property Relationships \DeclareAcronymvdWshort=vdW, long=van der Waals \DeclareAcronymAPIshort=API, long=Active Pharmaceutical Ingredient \DeclareAcronymCOSMO-RSshort=COSMO-RS, long=Conductor-like screening model for realistic solvation \DeclareAcronymOFshort=OF, long=Objective Function
[inst1]organization=Institute of Thermal Separation Processes, Hamburg University of Technology, city=Hamburg, postcode=21073, country=Germany
[inst2]organization=Institute of Technical Thermodynamics, RWTH Aachen University, city=Aachen, postcode=52062, country=Germany
[inst3]organization=FAccTs GmbH, city=Cologne, postcode=50677, country=Germany
1 Introduction
The accurate prediction of solvation free energies of solutes is crucial to understand phenomena occurring in the liquid phase. From this quantity, one can determine varius thermodynamic and kinetic properties such as reaction rates, equilibrium constants, activity coefficients, dissociation acidity constants, partition coefficients. Consequently, solvation free energy plays a pivotal role in chemical reactions[1, 2, 3, 4, 5] and design of materials with novel properties[6, 7, 8, 9, 10, 11]. Despite significant efforts over recent decades, precise prediction of remains a challenge.
In the last decade, various explicit[12, 13], implicit[14, 15, 16, 17, 18, 19], and data-driven[20, 21, 22, 23, 24, 25, 26] approaches have been used for solution phase property prediction. Explicit approaches such as \acMD are less common methods as they are quite computational time expensive as one needs to dissolve a solute in thousands of solvent molecules. Implicit approaches are more common in solution phase property prediction since they are less computational demanding. These approaches accurately predict the solvation free energies of neutral solutes with an uncertainty ranging from [27, 15, 28, 29, 30]. Among others, the \acCOSMO-RS is a frequently used fully predictive implicit model with an uncertainty of for predicting the solvation free energy of neutral solutes[27, 31, 32]. The basic principle of \acCOSMO-RS is based on the approximation of molecular interactions by the interactions of surface segments from the molecular cavities. This makes the calculations less demanding than \acMD calculations as the required input information only needs to be calculated once for each molecule from \acQM. Data-driven models have shown great potential in liquid phase property prediction mainly because many well established experimental databases are available. For example, \acML methods have been quite promising in predicting the solvation free energies of neutral solutes[33, 34, 35]. Vermeire et al. [32], trained and fine tuned a \acGNN model for predicting solvation free energies of neutral molecules reporting an uncertainty of . While these \acGNN perform well on training datasets, their ability to generalize to new structures with different atoms and functional groups remains challenging. Empirical evidence suggests that their embeddings can generalize across different molecular spaces, but achieving robust out-of-distribution performance is still difficult[36, 37].
Recently, we published an open source version of the \acCOSMO-RS model, which we will call openCOSMO-RS 22[38] in the following. This implementation of \acCOSMO-RS is the first open source version introducing additional descriptors besides the screening charge density. Having additional descriptors allowed the model to be modified for electrolytes with great success in the past[39, 40, 41, 42, 43, 44]. For neutral molecules, openCOSMO-RS 22 performs quite well for predicting the \acIDAC with a \acRMSD of based on TURBOMOLE 6.6 parameterization and based on ORCA 5.0.3[45, 46, 47, 48]. Although openCOSMO-RS 22 was able to predict equilibrium properties between two or more solvents, it was not capable of predicting properties between gas and liquid phase such as the solvation free energy.
In this study, we perform a new parameterization of the model based on quantum chemical calculations from the software ORCA 6.0. This will be called openCOSMO-RS 24a. To do so, initially, we compile experimental data on solvation free energies, partition coefficients, and activity coefficients for a representative range of solutes and solvents. Since the molar volume of the solvent is required to predict the solvation free energy, we develop a \acQSPR model to predict the molar volume of the solvent at based on experimental data available in the literature[49]. We modified the openCOSMO-RS conformer workflow compared to that of our previous work[38] by adding quantum chemical calculations of gas phase energies as these are needed to calculate the solvation free energies. Leveraging the experimental data together with the \acQSPR model, we parametrize openCOSMO-RS 24a for predicting the solvation free energies for a wide range of solutes and solvents. During this work, we found that the Gaussian charge scheme, used within CPCM[50, 51] in ORCA[52] produced very small segments leading to unusually large screening charge densities. This was addressed by rejecting the addition of segments smaller than a specified threshold () with minimum effect on the calculated energies. Furthermore, in ORCA, a Lagrangian-based algorithm is used to calculate the outlying charge correction[53]. Although this is not a treatment as advanced as the method proposed by Klamt [54] as it neglects the spacial distribution of the outlying charge, it should be enough for the neutral molecules tested. In the future a more thorough analysis of this is planned as it becomes especially important for anions. Finally, we report the performance of openCOSMO-RS 24a, which as of now can directly be used within the ORCA 6.0 software as additional solvation model enabling the user to access a variety of liquid phase properties which previously was not possible.
2 Methods
2.1 openCOSMO-RS
The theory of openCOSMO-RS has been discussed in previous studies[55, 38] and only the equation related to solvation free energy is briefly summarized here. The solvation free energy can then be calculated similarly to Klamt et al. [56, 17] from
(1) |
The term represents the dielectric energy, which is the energy involved in transferring the solute from the gas phase to an ideal conductor. The second term refers to the chemical potential at infinite dilution in the liquid phase, using the ideal conductor as the reference state and it is directly predicted by openCOSMO-RS. The third term encompasses the energy required for cavity formation and includes a van der Waals-like contribution to the solvation free energy, calculated by summing the product of each atom’s area on the solute molecule and a factor that depends on the atomic number. The fourth term provides a correction for molecules containing rings, determined by multiplying a general parameter by the number of rings in the solute structure. The fifth term accounts for the change in units of the reference states from mole fraction (units of the calculation) to molar concentration (units of the experimental data) with and representing the molar volumes of the ideal gas and the liquid phase, respectively. The final term is an adjustable parameter.
2.2 Computational details
In the following, we describe the openCOSMO-RS 24a conformer workflow for searching and calculating all necessary input data for the gas and \acCPCM phase. This is the updated overview of the pipeline also available on github[57]:
-
1.
Gas phase calculations
-
Filter conformers by an energy window of .
-
Cluster conformers by an \acRMSD window of nd save these for \acCPCM[52] calculations.
-
Single point energy calculation using DFT/BP86/def2-TZVPD level in ORCA for the conformer with the lowest energy.
-
2.
\ac
CPCM calculations
-
Filter confomers by an energy window of .
-
Cluster conformers by an \acRMSD window of nd select the conformers with the three lowest energies.
-
\ac
CPCM geometry optimizations at the DFT/BP86/def2-TZVP(-f) level in ORCA.
-
Filter confomers by an energy window of .
-
Cluster conformers by an \acRMSD window of nd select the conformer with the lowest energy.
-
\ac
CPCM geometry optimizations of DFT/BP86/def2-TZVP level in ORCA.
-
\ac
CPCM single point energy calculation using DFT/BP86/def2-TZVPD level in ORCA.
To search a large parameter space, the global solver differential evolution as implemented in SciPy[65] is used. Similar to our previous studies[55, 38], the following objective function is minimized for all optimizations:
(2) |
The average absolute deviation is calculated from:
(3) |
whereby is either , or .
2.3 Dataset overview
The dataset used in this work is comprised of three data types at : (i) infinite dilution activity coefficients, (ii) partition coefficients and (iii) solvation free energies. The 800+ infinite dilution activity coefficients are taken from Parcher et al. [66], Voutsas & Tassios[67], Kontogeorgis et al. [68], Kato et al. [69] and He & Zhong[70]. The partition coefficients for the following solvent combinations: octanol + water, benzene + water, hexane + water, and diethyl ether + water are collected by Klamt et al. [17]. The 2000+ solvation free energies are taken from Marenich et al. [71]. Xylene is excluded from the calculations as it is a mixture of constitutional isomers. Additionally, values for water as a solute in all three data types were excluded due to their known prediction issues when solvated in non-polar solvents within the \acCOSMO-RS framework without further model improvements[17, 72]. Even for molecular simulations treating mixtures of water and alkanes over the complete concentration range is challenging for most models. [73, 74, 75]
To calculate the solvation free energy, the molar volume of the pure solvent is required (see Equation 1). Thus, we develop a \acQSPR model in this study to predict the molar volume of the solvent at based on experimental data available from Mathieu & Bouteloup[49]. The complete dataset is cleaned and normalized: isotopes and explicit hydrogens are deleted, duplicates are merged, and only the first value in the original data for each component is retained.
3 Results and Discussion
3.1 Predictive QSPR model for molar volumes of the solvent at
To enable the fully predictive calculation of solvation energies, a model is developed to predict the only quantity not calculated within openCOSMO-RS; the molar volume of the pure solvent. The model is based on a linear combination of descriptors, represented by the following equation:
(4) | |||
where is the area of the surface segments on the cavity of the solute, are the respective sigma moments, is the number of atoms and is the number of silicon atoms in the molecule. All descriptor combinations were systematically evaluated. Notably, offers a more effective representation of the molar volume of the solvent compared to the volume of the cavity while utilizing fewer descriptors. The significant effect of the number of silicon atoms on the model’s accuracy might suggest that the silicon radius might not be optimal. Figure 1 shows the predicted molar volumes of the solvents using the \acQSPR molar volume of the solvent model against the experimental molar volumes of the solvents at based on experimental data described more in detail in the previous section. Overall, the predicted molar volumes of the solvents agree well with the experimental ones with \acAAD of and R2 of 0.995. Mathieu and Bouteloup report a model for predicting the standard density with an average relative error 1.7%. For density prediction, our \acQSPR model achieves a relative error of 2.2%, which is an accuracy similar to that of other group contribution methods[76, 77, 78].
![Refer to caption](https://cdn.statically.io/img/arxiv.org/extracted/5707564/figures/molar_volume.png)
3.2 Parametrization
All non-fixed parameters in Table 1 are simultaneously adjusted using the differential evolution algorithm implemented in SciPy[65]. Following the approach used in openCOSMO-RS 22[38], we incorporate the improved misfit term, which includes the additional descriptor to recover some of the lost 3D information. All data are included in the regression of the parameters.
Parameter | Value | Parameter | Value |
0.5 | 0.123 | ||
5.925 | 0.096 | ||
7281 | 0.003 | ||
2.4 | 0.015 | ||
43327 | 0.023 | ||
0.00961 | 0.143 | ||
41.624 | 0.891 | ||
-18.61 | 0.018 | ||
1.100 | 0.015 | ||
0.146 |
3.3 Model performance
In this work, infinite dilution activity coefficients, partition coefficients, and solvation free energies, all at are used to parameterize openCOSMO-RS 24a. Table 2 provides an overview of all calculations for openCOSMO-RS 24a, COSMOtherm 24 TZVP and COSMOtherm 24 FINE, calculated with the lowest energy conformer. Figures 2, 3, and 4 show the predicted values obtained from openCOSMO-RS 24a against the experimental values compiled from the literature for the activity coefficients, solvation free energies, and partition coefficients, respectively. In all Figures, black represents solutes without hydrogen bonds, red represents solutes that are hydrogen bond donors, and blue represents solutes that are hydrogen bond acceptors. Whether or not a solute is considered hydrogen bonding depends on the existence of area having a screening charge density larger than the threshold hydrogen bonding parameter .
For the infinite dilution activity coefficients, a total of 882 data points are used (see Figure 2). openCOSMO-RS 24a achieves an \acAAD of and R2 of , showing an improvement compared to our previous work openCOSMO-RS 22[38], which had an \acAAD of Overall, it can be observed, that overall, for the less polar solutes, the infinite dilution activity coefficients are slightly underestimated while for the more polar ones are somewhat overestimated. The infinite dilution activity coefficient represents the energy required for transferring one molecule from pure component to being infinitely dilute in a second solvent. Hence, this systematic shift in model performance based on solute polarity might be either due to the overestimation of attractive hydrogen bonding for more polar molecules in the reference state (i.e. pure solute) or due to an overestimation of the repulsive misfit energy at infinite dilution in the solvent. This will be investigated further in future work.
![Refer to caption](https://cdn.statically.io/img/arxiv.org/extracted/5707564/figures/IDAC.png)
For the partition coefficients, the dataset includes 296 data points (see Figure 3). The openCOSMO-RS 24a models achieves an \acAAD of and R2 of , indicitating good agreement between the calculated and the experimental data. Similar to \acIDAC, the model tends to overestimate partition coefficients for more polar solutes and slightly underestimate them for less polar ones. The partition coefficient measures the energy required to transfer a solute at infinite dilution from water to another solvent, representing the relative interaction energy of the solute with the other solvent compared to water. The data suggests that the greater the polarity difference is between the solute and the other solvent, the larger the deviation in calculated values by openCOSMO-RS 24a. As mentioned earlier, systems with water as a solute are excluded from the dataset because the usual COSMO-RS theory struggles to handle water at infinite dilution in very non-polar solvents [17, 72]. This issue appears to extend to other polar molecules in non-polar solvents, though less pronounced than with water. This suggests a general systematic issue that could potentially be addressed to improve the model.
![Refer to caption](https://cdn.statically.io/img/arxiv.org/extracted/5707564/figures/logK.png)
For the solvation free energies, the dataset contains 2129 data points (see Figure 4). The openCOSMO-RS 24a model achieves an \acAAD of and R2 of , showing a strong agreement between the calculated and experimental values, which is impressive considering that the molecules in this study are represented by only a single conformer. In comparison, the commercial software COSMOtherm reports a similar uncertainty of [27, 31, 32] for predicting the solvation free energy of neutral solutes, though it uses an ensemble of conformers. However, the comparison may not be entirely fair, as the parameters of openCOSMO-RS 24a are directly adjusted to the data used in this study, whereas the accuracy reported by was likely based on a larger dataset.
![Refer to caption](https://cdn.statically.io/img/arxiv.org/extracted/5707564/figures/dgsolvation.png)
Table 2 presents a performance comparison of openCOSMO-RS 24a, COSMOtherm 24 TZVP, and COSMOtherm 24 FINE. All calculations are conducted using only the lowest energy conformer, providing a fair comparison since openCOSMO-RS 24a currently lacks the capability to integrate multiple conformers. Additional calculations using multiple conformers for TZVP and FINE parameterizations are included in the appendix. Nevertheless, for the dataset examined in this study, incorporating multiple conformers does not significantly impact the results. The results are categorized by solute type regarding its hydrogen binding capabilities. There is no universally accepted definition of a hydrogen-bonding molecule in the context of COSMO-RS modeling. However, we classify molecules based on the presence of areas in the -profile with an absolute screening charge density larger than the threshold. This method allows for a consistent categorization of molecule types, which directly relates to the terms in the interaction equations.
openCOSMO-RS 24a | COSMOtherm 24 TZVP | COSMOtherm 24 FINE | ||
IDAC [-] | AAD | AAD | AAD | |
non-HB | 568 | 0.41 | 0.43 | 0.40 |
HB acceptor | 172 | 0.63 | 0.53 | 0.35 |
HB donor | 35 | 0.64 | 0.93 | 0.40 |
HB acceptor/donor | 107 | 0.78 | 0.66 | 0.33 |
Total | 882 | 0.51 | 0.50 | 0.38 |
Partition coefficients [-] | ||||
non-HB | 68 | 0.36 | 0.54 | 0.46 |
HB acceptor | 104 | 0.84 | 0.63 | 0.50 |
HB donor | 12 | 0.25 | 0.28 | 0.23 |
HB acceptor/donor | 112 | 0.99 | 0.76 | 0.59 |
Total | 296 | 0.76 | 0.64 | 0.51 |
Solvation free energies [] | ||||
non-HB | 434 | 0.36 | 0.34 | 0.32 |
HB acceptor | 775 | 0.40 | 0.47 | 0.45 |
HB donor | 69 | 0.52 | 0.39 | 0.32 |
HB acceptor/donor | 851 | 0.54 | 0.53 | 0.40 |
Total | 2129 | 0.45 | 0.46 | 0.40 |
Overall | 3307 | 0.49 | 0.50 | 0.41 |
Overall, for \acIDAC and solvation free energies, in the observed dataset, openCOSMO-RS 24a performs comparable to COSMOtherm 24 TZVP. Only for partition coefficients COSMOtherm 24 TZVP is more accurate. The COSMOtherm 24 FINE model delivers more accurate results for the majority of the systems in the dataset. This is a first step to improving openCOSMO-RS in a more general fashion for neutral molecules with current work focusing on improvements like the combinatorial term, temperature dependency, dispersion interactions, multiple conformers and polarizability effects.
4 Conclusions
In this work, we extended openCOSMO-RS to be able to calculate solvation free energies. To do so, we developed a \acQSPR model to predict the molar volumes of the solvents required for calculating solvation free energies. By modifying the openCOSMO-RS conformer workflow and combining in total 3307 data points of activity coefficients, solvation free energies, and partition coefficients, we parameterized openCOSMO-RS 24a based on ORCA 6.0.
The openCOSMO-RS 24a parameterization based on ORCA 6.0 achieved an \acAAD of for predicting solvation free energies, which is comparable to the uncertainty of that is reported by the commercial software COSMOtherm. For predicting the partition coefficients, openCOSMO-RS 24a achieves an \acAAD of . Furthermore, openCOSMO-RS 24a showed an improvement in predicting activity coefficients with an \acAAD of compared to an \acAAD of with the previous openCOSMO-RS 22 parameterization.
While the openCOSMO-RS 24a parameterization performs well when representing each molecule with a single conformer, future work will focus on extending the model to handle conformer ensembles. Additionally, we plan to extend openCOSMO-RS to ionic solutes, as current models often struggle to accurately predict the solvation free energies of ionic compounds[55, 79, 80].
The performance of openCOSMO-RS in predicting liquid phase properties, even with a single conformer, demonstrates that the inclusion of additional chemical descriptors to surface charge improves the model’s accuracy. Moreover, substantial efforts have been made to integrate openCOSMO-RS into ORCA 6.0, enabling users to directly access a variety of liquid phase properties; a capability that was previously unavailable. This integration marks a significant advancement, providing users with a powerful tool for comprehensive property prediction within a single software environment.
References
- [1] S. Mahalakshmi, V. Sathyanarayanamoorthi, V. Kannappan, Studies on free energy and its components of 2-Phenylindole and its derivatives, Journal of Molecular Liquids 139 (1–3) (2008) 43–47. doi:10.1016/j.molliq.2007.10.012.
- [2] L. C. Kröger, W. A. Kopp, K. Leonhard, Prediction of Chain Propagation Rate Constants of Polymerization Reactions in Aqueous NIPAM/BIS and VCL/BIS Systems, Journal of Physical Chemistry B 121 (13) (2017) 2887–2895. arXiv:http://dx.doi.org/10.1021/acs.jpcb.6b09147, doi:10.1021/acs.jpcb.6b09147.
- [3] M. Heyden, Disassembling solvation free energies into local contributions—Toward a microscopic understanding of solvation processes, WIREs Computational Molecular Science 9 (2) (Aug. 2018). doi:10.1002/wcms.1390.
- [4] X. Zhang, R. S. DeFever, S. Sarupria, R. B. Getman, Free Energies of Catalytic Species Adsorbed to Pt(111) Surfaces under Liquid Solvent Calculated Using Classical and Quantum Approaches, J. Chem. Inf. Model. 59 (5) (2019) 2190–2198. doi:10.1021/ACS.JCIM.9B00089.
- [5] T. Nevolianis, N. Wolter, L. F. Kaven, L. Krep, C. Huang, A. Mhamdi, A. Mitsos, A. Pich, K. Leonhard, Kinetic Modeling of a Poly(N-vinylcaprolactam-co-glycidyl methacrylate) Microgel Synthesis: A Hybrid In Silico and Experimental Approach, Industrial & Engineering Chemistry Research 62 (2) (2023) 893–902. arXiv:https://doi.org/10.1021/acs.iecr.2c03291, doi:10.1021/acs.iecr.2c03291.
- [6] N. D. Austin, N. V. Sahinidis, D. W. Trahan, A COSMO-based approach to computer-aided mixture design, Chemical Engineering Science 159 (2017) 93–105.
- [7] N. D. Austin, N. V. Sahinidis, I. A. Konstantinov, D. W. Trahan, COSMO-based computer-aided molecular/mixture design: A focus on reaction solvents, AIChE Journal 64 (1) (2018) 104–122.
- [8] C. Gertig, L. Kröger, L. Fleitmann, J. Scheffczyk, A. Bardow, K. Leonhard, Rx-COSMO-CAMD: Computer-aided Molecular Design of Reaction Solvents Based on Predictive Kinetics from Quantum Chemistry, Industrial & Engineering Chemistry Research 58 (51) (2019) 22835–22846. arXiv:https://doi.org/10.1021/acs.iecr.9b03232, doi:10.1021/acs.iecr.9b03232.
- [9] T. Zhou, K. McBride, S. Linke, Z. Song, K. Sundmacher, Computer-aided solvent selection and design for efficient chemical processes, Current Opinion in Chemical Engineering 27 (mar 2020). doi:10.1016/j.coche.2019.10.007.
- [10] L. Raßpe-Lange, A. Hoffmann, C. U. Gertig, J. Heck, K. Leonhard, S. Herres-Pawlis, Geometrical benchmarking and analysis of redox potentials of copper(I/II) guanidine-quinoline complexes: Comparison of semi-empirical tight-binding and DFT methods and the challenge of describing the entatic state (part III), J. Comput. Chem. 44 (3) (2023) 319–328. doi:10.1002/JCC.26927.
- [11] T. Nevolianis, R. A. Ahmed, A. Hellweg, M. Diedenhofen, K. Leonhard, Blind prediction of toluene/water partition coefficients using COSMO-RS: results from the SAMPL9 challenge, Phys. Chem. Chem. Phys. 25 (2023) 31683–31691. doi:10.1039/D3CP04077A.
- [12] E. J. Smith, T. Bryk, A. D. J. Haymet, Free energy of solvation of simple ions: Molecular-dynamics study of solvation of Cl- and Na+ in the ice/water interface, The Journal of Chemical Physics 123 (3) (2005) 034706. arXiv:https://doi.org/10.1063/1.1953578, doi:10.1063/1.1953578.
- [13] C. Xi, F. Zheng, G. Gao, Z. Song, B. Zhang, C. Dong, X.-W. Du, L.-W. Wang, Ion Solvation Free Energy Calculation Based on Ab Initio Molecular Dynamics Using a Hybrid Solvent Model, Journal of Chemical Theory and Computation 0 (0) (0) null, pMID: 36253911. arXiv:https://doi.org/10.1021/acs.jctc.1c01298, doi:10.1021/acs.jctc.1c01298.
- [14] J. Tomasi, B. Mennucci, R. Cammi, Quantum Mechanical Continuum Solvation Models, Chemical Reviews 105 (8) (2005) 2999–3094, pMID: 16092826. arXiv:https://doi.org/10.1021/cr9904009, doi:10.1021/cr9904009.
- [15] A. V. Marenich, C. J. Cramer, D. G. Truhlar, Universal Solvation Model Based on Solute Electron Density and on a Continuum Model of the Solvent Defined by the Bulk Dielectric Constant and Atomic Surface Tensions, The Journal of Physical Chemistry B 113 (18) (2009) 6378–6396, pMID: 19366259. arXiv:https://doi.org/10.1021/jp810292n, doi:10.1021/jp810292n.
- [16] A. Klamt, Conductor-like Screening Model for Real Solvents: A New Approach to the Quantitative Calculation of Solvation Phenomena, J. Phys. Chem. 99 (1995) 2224–2235. doi:10.1021/j100007a062.
- [17] A. Klamt, V. Jonas, T. Bürger, J. C. W. Lohrenz, Refinement and Parametrization of COSMO-RS, J. Phys. Chem. A 102 (26) (1998) 5074–5085. doi:10.1021/jp980017s.
- [18] A. Klamt, F. Eckert, W. Arlt, COSMO-RS: an alternative to simulation for calculating thermodynamic properties of liquid mixtures, Annu Rev Chem Biomol Eng 1 (2010) 101–122. doi:10.1146/annurev-chembioeng-073009-100903.
- [19] M. Stahn, S. Ehlert, S. Grimme, Extended Conductor-like Polarizable Continuum Solvation Model (CPCM-x) for Semiempirical Methods, The Journal of Physical Chemistry A 127 (33) (2023) 7036–7043, publisher: American Chemical Society. doi:10.1021/acs.jpca.3c04382.
- [20] H. Kang, H. Choi, H. Park, Prediction of Molecular Solvation Free Energy Based on the Optimization of Atomic Solvation Parameters with Genetic Algorithm, J. Chem. Inf. Model. 47 (2) (2007) 509–514. doi:10.1021/CI600453B.
- [21] J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, G. E. Dahl, Neural Message Passing for Quantum Chemistry, CoRR abs/1704.01212 (2017). arXiv:1704.01212.
- [22] A. M. Schweidtmann, J. G. Rittig, A. König, M. Grohe, A. Mitsos, M. Dahmen, Graph Neural Networks for Prediction of Fuel Ignition Quality, Energy Fuels 34 (9) (2020) 11395–11407. doi:10.1021/acs.energyfuels.0c01533.
- [23] B. Winter, C. Winter, J. Schilling, A. Bardow, A smile is all you need: Predicting limiting activity coefficients from SMILES with natural language processing, CoRR abs/2206.07048 (2022). arXiv:2206.07048, doi:10.48550/ARXIV.2206.07048.
- [24] E. I. Sanchez Medina, S. Linke, M. Stoll, K. Sundmacher, Graph neural networks for the prediction of infinite dilution activity coefficients, Digital Discovery 1 (3) (2022) 216–225. doi:10.1039/d1dd00037c.
-
[25]
K. C. Felton, H. Ben-Safar, A. A. Lapkin, DeepGamma : A deep learning model for activity coeffcient prediction, 2021.
URL https://api.semanticscholar.org/CorpusID:252086004 - [26] J. G. Rittig, K. B. Hicham, A. M. Schweidtmann, M. Dahmen, A. Mitsos, Graph Neural Networks for Temperature-dependent Activity Coefficient Prediction of Solutes in Ionic Liquids, CoRR abs/2206.11776 (2022). arXiv:2206.11776, doi:10.48550/ARXIV.2206.11776.
- [27] T. M. Letcher, Development and Applications in Solubility, Royal Society of Chemistry, 2007.
- [28] A. V. Marenich, C. J. Cramer, D. G. Truhlar, Generalized Born Solvation Model SM12, Journal of Chemical Theory and Computation 9 (1) (2013) 609–620.
- [29] D. Qiu, P. S. Shenkin, F. P. Hollinger, W. C. Still, The GB/SA Continuum Model for Solvation. A Fast Analytical Method for the Calculation of Approximate Born Radii, The Journal of Physical Chemistry A 101 (16) (1997) 3005–3014. arXiv:https://doi.org/10.1021/jp961992r, doi:10.1021/jp961992r.
- [30] B. A. C. Horta, P. T. Merz, P. F. J. Fuchs, J. Dolenc, S. Riniker, P. H. Hünenberger, A GROMOS-compatible Force Field for Small Organic Molecules in the Condensed Phase: The 2016H66 Parameter Set, Journal of Chemical Theory and Computation 12 (8) (2016) 3825–3850, pMID: 27248705. arXiv:https://doi.org/10.1021/acs.jctc.6b00187, doi:10.1021/acs.jctc.6b00187.
- [31] A. Klamt, M. Diedenhofen, Calculation of Solvation Free Energies with DCOSMO-RS, The Journal of Physical Chemistry A 119 (21) (2015) 5439–5445. doi:10.1021/jp511158y.
- [32] F. H. Vermeire, W. H. Green, Transfer learning for solvation free energies: From quantum chemistry to experiments, Chemical Engineering Journal 418 (2021) 129307. doi:https://doi.org/10.1016/j.cej.2021.129307.
- [33] H. Lim, Y. Jung, MLSolvA: solvation free energy prediction from pairwise atomistic interactions by machine learning, J. Cheminformatics 13 (1) (2021) 56. doi:10.1186/S13321-021-00533-Z.
- [34] Y. Chung, F. H. Vermeire, H. Wu, P. J. Walker, M. H. Abraham, W. H. G. Jr., Group Contribution and Machine Learning Approaches to Predict Abraham Solute Parameters, Solvation Free Energy, and Solvation Enthalpy, J. Chem. Inf. Model. 62 (3) (2022) 433–446. doi:10.1021/ACS.JCIM.1C01103.
- [35] J. Ferraz-Caetano, F. Teixeira, D. C. Santana, Explainable Supervised Machine Learning Model To Predict Solvation Gibbs Energy, J. Chem. Inf. Model. 64 (7) (2024) 2250–2262. doi:10.1021/ACS.JCIM.3C00544.
- [36] K. Atz, F. Grisoni, G. Schneider, Geometric Deep Learning on Molecular Representations, CoRR abs/2107.12375 (2021). arXiv:2107.12375.
- [37] H. Stärk, D. Beaini, G. Corso, P. Tossou, C. Dallago, S. Günnemann, P. Liò, 3D Infomax improves GNNs for Molecular Property Prediction, CoRR abs/2110.04126 (2021). arXiv:2110.04126.
- [38] T. Gerlach, S. Müller, A. G. de Castilla, I. Smirnova, An open source COSMO-RS implementation and parameterization supporting the efficient implementation of multiple segment descriptors, Fluid Phase Equilibria 560 (2022) 113472. doi:https://doi.org/10.1016/j.fluid.2022.113472.
- [39] T. Gerlach, S. Müller, I. Smirnova, Development of a COSMO-RS based model for the calculation of phase equilibria in electrolyte systems, AIChE Journal 64 (1) (2018) 272–285. doi:10.1002/aic.15875.
- [40] S. Müller, A. González de Castilla, C. Taeschler, A. Klein, I. Smirnova, Evaluation and refinement of the novel predictive electrolyte model COSMO-RS-ES based on solid-liquid equilibria of salts and Gibbs free energies of transfer of ions, Fluid Phase Equilibria 483 (2019) 165–174. doi:10.1016/j.fluid.2018.10.023.
- [41] S. Müller, A. González de Castilla, C. Taeschler, A. Klein, I. Smirnova, Calculation of thermodynamic equilibria with the predictive electrolyte model COSMO-RS-ES: Improvements for low permittivity systems, Fluid Phase Equilibria 506 (2020) 112368. doi:10/gf98vf.
- [42] A. González de Castilla, S. Müller, I. Smirnova, On the analogy between the restricted primitive model and capacitor circuits: Semi-empirical alternatives for over- and underscreening in the calculation of mean ionic activity coefficients, Journal of Molecular Liquids 326 (2021) 115204. doi:10.1016/j.molliq.2020.115204.
- [43] A. González de Castilla, S. Müller, I. Smirnova, On the analogy between the restricted primitive model and capacitor circuits. Part II: A generalized Gibbs-duhem consistent extension of the Pitzer-Debye-hückel term with corrections for low and variable relative permittivity, Journal of Molecular Liquids 360 (2022) 119398. doi:10.1016/j.molliq.2022.119398.
- [44] M. Arrad, K. Thomsen, S. Müller, I. Smirnova, Thermodynamic modeling using extended UNIQUAC and COSMO-RS-ES models: Case study of the cesium nitrate-water system over a large range of temperatures, Fluid Phase Equilibria 580 (2024) 114037. doi:10.1016/j.fluid.2024.114037.
- [45] F. Neese, The ORCA program system, WIREs Comput Mol Sci 2 (1) (2012) 73–78. doi:10.1002/wcms.81.
- [46] F. Neese, Software update: the ORCA program system, version 4.0, WIREs Comput Mol Sci 8 (1) (Jan. 2018). doi:10.1002/wcms.1327.
- [47] F. Neese, F. Wennmohs, U. Becker, C. Riplinger, The ORCA quantum chemistry program package, The Journal of chemical physics 152 (22) (2020) 224108. doi:10.1063/5.0004608.
- [48] F. Neese, Software update: The ORCA program system—Version 5.0, WIREs Computational Molecular Science 12 (5) (2022) e1606, _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/wcms.1606. doi:10.1002/wcms.1606.
- [49] D. Mathieu, R. Bouteloup, Reliable and Versatile Model for the Density of Liquids Based on Additive Volume Increments, Industrial & Engineering Chemistry Research 55 (50) (2016) 12970–12980. doi:10.1021/acs.iecr.6b03809.
- [50] V. Barone, M. Cossi, Quantum Calculation of Molecular Energies and Energy Gradients in Solution by a Conductor Solvent Model, The Journal of Physical Chemistry A 102 (11) (1998) 1995–2001. doi:10.1021/jp9716997.
- [51] D. M. York, M. Karplus, A Smooth Solvation Potential Based on the Conductor-like Screening Model, The Journal of Physical Chemistry A 103 (50) (1999) 11060–11079. doi:10.1021/jp992097l.
- [52] M. Garcia-Ratés, F. Neese, Effect of the Solute Cavity on the Solvation Energy and its Derivatives within the Framework of the Gaussian Charge Scheme, J. Comput. Chem. 41 (9) (2020) 922–939. doi:10.1002/JCC.26139.
- [53] C. C. Pye, T. Ziegler, An implementation of the conductor-like screening model of solvation within the Amsterdam density functional package, Theoretical Chemistry Accounts: Theory, Computation, and Modeling (Theoretica Chimica Acta) 101 (6) (1999) 396–408. doi:10.1007/s002140050457.
- [54] A. Klamt, V. Jonas, Treatment of the outlying charge in continuum solvation models, The Journal of Chemical Physics 105 (22) (1996) 9972–9981. doi:10/dbc983.
- [55] L. C. Kröger, S. Müller, I. Smirnova, K. Leonhard, Prediction of Solvation Free Energies of Ionic Solutes in Neutral Solvents, The Journal of Physical Chemistry A 124 (20) (2020) 4171–4181, pMID: 32336096. arXiv:https://doi.org/10.1021/acs.jpca.0c01606, doi:10.1021/acs.jpca.0c01606.
- [56] A. Klamt, Conductor-like Screening Model for Real Solvents: A New Approach to the Quantitative Calculation of Solvation Phenomena, Journal of Physical Chemistry 99 (7) (1995) 2224–2235. arXiv:https://doi.org/10.1021/j100007a062, doi:10.1021/j100007a062.
-
[57]
openCOSMO-RS github repository.
URL https://github.com/TUHH-TVT - [58] J. Ebejer, G. M. Morris, C. M. Deane, Freely Available Conformer Generation Methods: How Good Are They?, J. Chem. Inf. Model. 52 (5) (2012) 1146–1158. doi:10.1021/CI2004658.
-
[59]
RDKit, version 2023.09.5.
URL https://www.rdkit.org/ - [60] A. D. Becke, Density-functional exchange-energy approximation with correct asymptotic behavior, Phys. Rev. A 38 (1988) 3098–3100. doi:10.1103/PhysRevA.38.3098.
- [61] J. P. Perdew, Density-functional approximation for the correlation energy of the inhomogeneous electron gas, Phys. Rev. B 33 (1986) 8822–8824. doi:10.1103/PhysRevB.33.8822.
- [62] F. Weigend, R. Ahlrichs, Balanced basis sets of split valence, triple zeta valence and quadruple zeta valence quality for H to Rn: Design and assessment of accuracy, Physical Chemistry Chemical Physics 7 (18) (7) 3297–3305, publisher: Royal Society of Chemistry. doi:10.1039/B508541A.
- [63] S. Ehlert, M. Stahn, S. Spicher, S. Grimme, Robust and Efficient Implicit Solvation Model for Fast Semiempirical Methods, Journal of Chemical Theory and Computation 17 (7) (2021) 4250–4261, publisher: American Chemical Society. doi:10.1021/acs.jctc.1c00471.
- [64] C. Bannwarth, S. Ehlert, S. Grimme, GFN2-xTB-an Accurate and Broadly Parametrized Self-consistent Tight-binding Quantum Chemical Method with Multipole Electrostatics and Density-dependent Dispersion Contributions, Journal of chemical theory and computation 15 (3) (2019) 1652–1671. doi:10.1021/acs.jctc.8b01176.
- [65] P. Virtanen, R. Gommers, T. E. Oliphant, M. Haberland, T. Reddy, D. Cournapeau, E. Burovski, P. Peterson, W. Weckesser, J. Bright, S. van der Walt, M. Brett, J. Wilson, K. J. Millman, N. Mayorov, A. R. J. Nelson, E. Jones, R. Kern, E. Larson, C. Carey, I. Polat, Y. Feng, E. W. Moore, J. VanderPlas, D. Laxalde, J. Perktold, R. Cimrman, I. Henriksen, E. A. Quintero, C. R. Harris, A. M. Archibald, A. H. Ribeiro, F. Pedregosa, P. van Mulbregt, SciPy, SciPy 1.0-fundamental Algorithms for Scientific Computing in Python, CoRR abs/1907.10121 (2019). arXiv:1907.10121.
- [66] J. F. Parcher, P. H. Weiner, C. L. Hussey, T. N. Westlake, Specific retention volumes and limiting activity coefficients of C4-C8 alkane solutes in C22-C36 n-alkane solvents, Journal of Chemical and Engineering Data 20 (2) (1975) 145–151.
- [67] E. C. Voutsas, D. P. Tassios, Prediction of Infinite-dilution Activity Coefficients in Binary Mixtures with UNIFAC. A Critical Evaluation, Industrial & Engineering Chemistry Research 35 (4) (1996) 1438–1445, number: 4. doi:10.1021/ie9503555.
- [68] G. M. Kontogeorgis, G. I. Nikolopoulos, A. Fredenslund, D. P. Tassios, Improved models for the prediction of activity coefficients in nearly athermal mixtures Part II. A theoretically-based GE-model based on the van der Waals partition function, Fluid Phase Equilibria 127 (1) (1997) 103–121, number: 1. doi:10.1016/S0378-3812(96)03145-7.
- [69] S. Kato, D. Hoshino, H. Noritomi, K. Nagahama, Infinite dilution activity coefficients of n-alkane solutes, butane to decane, in n-alkane solvents, heptane to hexatriacontane, Fluid Phase Equilibria 194–197 (2002) 641–652. doi:10.1016/S0378-3812(01)00686-0.
- [70] J. He, C. Zhong, A QSPR study of infinite dilution activity coefficients of organic compounds in aqueous solutions, Fluid Phase Equilibria 205 (2) (205) 303–316, number: 2. doi:10.1016/S0378-3812(02)00296-0.
- [71] A. V. Marenich, R. M. Olson, C. P. Kelly, C. J. Cramer, D. G. Truhlar, Self-consistent Reaction Field Model for Aqueous and Nonaqueous Solutions Based on Accurate Polarized Partial Charges, Journal of Chemical Theory and Computation 3 (6) (2007) 2011–2033. arXiv:https://doi.org/10.1021/ct7001418, doi:10.1021/ct7001418.
- [72] A. Klamt, Prediction of the mutual solubilities of hydrocarbons and water with COSMO-RS, Fluid Phase Equilibria 206 (1) (2003) 223–235. doi:10.1016/S0378-3812(02)00322-9.
- [73] D. Ballal, P. Venkataraman, W. A. Fouad, K. R. Cox, W. G. Chapman, Isolating the non-polar contributions to the intermolecular potential for water-alkane interactions, The Journal of Chemical Physics 141 (6) (2014) 064905. doi:10.1063/1.4892341.
- [74] D. Ballal, D. Asthagiri, A. V. Parambathu, P. Venkataraman, W. A. Fouad, K. R. Cox, W. G. Chapman, Erratum: ”Isolating the non-polar contributions to the intermolecular potential for water-alkane interactions” [J. Chem. Phys. 141, 064905 (2014)], The Journal of Chemical Physics 145 (11) (2016) 119901. doi:10.1063/1.4962733.
- [75] D. Asthagiri, A. Valiya Parambathu, D. Ballal, W. G. Chapman, Electrostatic and induction effects in the solubility of water in alkanes, The Journal of Chemical Physics 147 (7) (2017) 074506. doi:10.1063/1.4997916.
- [76] H. S. Elbro, A. Fredenslund, P. Rasmussen, Group contribution method for the prediction of liquid densities as a function of temperature for solvents, oligomers, and polymers, Industrial & Engineering Chemistry Research 30 (12) (1991) 2576–2582. doi:10.1021/ie00060a011.
- [77] E. C. Ihmels, J. Gmehling, Extension and Revision of the Group Contribution Method GCVOL for the Prediction of Pure Compound Liquid Densities, Industrial & Engineering Chemistry Research 42 (2) (2003) 408–412. arXiv:https://doi.org/10.1021/ie020492j, doi:10.1021/ie020492j.
- [78] A. S. Hukkerikar, B. Sarup, A. Ten Kate, J. Abildskov, G. Sin, R. Gani, Group-contribution+ (GC+) based estimation of properties of pure components: Improved property estimation and uncertainty analysis, Fluid Phase Equilibria 321 (2012) 25–43. doi:https://doi.org/10.1016/j.fluid.2012.02.010.
- [79] T. Nevolianis, M. Baumann, N. Viswanathan, W. A. Kopp, K. Leonhard, DISSOLVE: Database of ionic solutes’ solvation free energies, Fluid Phase Equilibr. 571 (2023) 113801. doi:10.1016/j.fluid.2023.113801.
- [80] J. W. Zheng, W. H. Green, Experimental Compilation and Computation of Hydration Free Energies for Ionic Solutes, The Journal of Physical Chemistry A 127 (48) (2023) 10268–10281. doi:10.1021/acs.jpca.3c05514.
Appendix A Performance of COSMOtherm 24 TZVP and COSMOtherm 24 FINE using multiple conformers
COSMOtherm 24 TZVP MC | COSMOtherm 24 FINE MC | ||
IDAC [-] | Count | AAD | AAD |
non-HB | 568 | 0.43 | 0.40 |
HB acceptor | 172 | 0.54 | 0.36 |
HB donor | 35 | 0.94 | 0.41 |
HB acceptor/donor | 107 | 0.65 | 0.27 |
Total | 882 | 0.50 | 0.38 |
Partition coefficients [-] | |||
non-HB | 68 | 0.56 | 0.46 |
HB acceptor | 104 | 0.63 | 0.52 |
HB donor | 12 | 0.28 | 0.22 |
HB acceptor/donor | 112 | 0.78 | 0.59 |
Total | 296 | 0.66 | 0.52 |
Solvation free energies [] | |||
non-HB | 434 | 0.34 | 0.32 |
HB acceptor | 775 | 0.46 | 0.45 |
HB donor | 69 | 0.37 | 0.31 |
HB acceptor/donor | 851 | 0.52 | 0.39 |
Total | 2129 | 0.46 | 0.40 |
Overall | 3307 | 0.51 | 0.42 |