Predicting solvation free energies for neutral molecules in any solvent with openCOSMO-RS

Simon Müller Thomas Nevolianis Miquel Garcia-Ratés Christoph Riplinger Kai Leonhard Irina Smirnova
Abstract

The accurate prediction of solvation free energies is critical for understanding various phenomena in the liquid phase, including reaction rates, equilibrium constants, activity coefficients, and partition coefficients. Despite extensive research, precise prediction of solvation free energies remains challenging. In this study, we introduce openCOSMO-RS 24a, an improved version of the open-source COSMO-RS model, capable of predicting solvation free energies alongside other liquid-phase properties. We parameterize openCOSMO-RS 24a using quantum chemical calculations from ORCA 6.0, leveraging a comprehensive dataset that includes solvation free energies, partition coefficients, and infinite dilution activity coefficients for various solutes and solvents at 25 °Ctimes25degreeCelsius25\text{\,}\mathrm{\SIUnitSymbolCelsius}start_ARG 25 end_ARG start_ARG times end_ARG start_ARG °C end_ARG. Additionally, we develop a Quantitative Structure-Property Relationships model to predict molar volumes of the solvents, an essential requirement for predicting solvation free energies from structure alone. Our results show that openCOSMO-RS 24a achieves an average absolute deviation of 0.45 kcal mol1times0.45timeskilocalmole10.45\text{\,}\mathrm{kcal}\text{\,}{\mathrm{mol}}^{-1}start_ARG 0.45 end_ARG start_ARG times end_ARG start_ARG start_ARG roman_kcal end_ARG start_ARG times end_ARG start_ARG power start_ARG roman_mol end_ARG start_ARG - 1 end_ARG end_ARG end_ARG for solvation free energies, 0.76 times0.76absent0.76\text{\,}start_ARG 0.76 end_ARG start_ARG times end_ARG start_ARG end_ARG for partition coefficients, and 0.51 times0.51absent0.51\text{\,}start_ARG 0.51 end_ARG start_ARG times end_ARG start_ARG end_ARG for infinite dilution activity coefficients, demonstrating improvements over the previous openCOSMO-RS 22 parameterization and comparable results to COSMOtherm 24 TZVP. A new command line interface for openCOSMO-RS 24a was developed which allows easy acces to the solvation energy model directly from within ORCA 6.0. This represents a significant advancement in the predictive modeling of solvation free energies and other solution-phase properties, providing researchers with a robust tool for applications in chemical and materials science.

\DeclareAcronym

AADshort=AAD, long=Average Absolute Deviation \DeclareAcronymRMSDshort=RMSD, long=Root Mean Square Deviation \DeclareAcronymMDshort=MD, long=Molecular Dynamics \DeclareAcronymGCshort=GC, long=Group Contribution \DeclareAcronymHBshort=HB, long=Hydrogen bond \DeclareAcronymQMshort=QM, long=Quantum Mechanics \DeclareAcronymQCshort=QC, long=Quantum chemical \DeclareAcronymCPCMshort=CPCM, long=Conductor-like polarizable continuum model \DeclareAcronymIDACshort=IDAC, long=infinite dilution activity coefficient \DeclareAcronymMLshort=ML, long=Machine Learning \DeclareAcronymD-MPNNshort=D-MPNN, long=Directed-Message Passing Neural Network \DeclareAcronymMPNNshort=D-MPNN, long=Message Passing Neural Network \DeclareAcronymGNNshort=GNN, long=Graphical Neural Network \DeclareAcronymRMSEshort=RMSE, long=Root-Mean-Square Error \DeclareAcronymQSPRshort=QSPR, long=Quantitative Structure-Property Relationships \DeclareAcronymvdWshort=vdW, long=van der Waals \DeclareAcronymAPIshort=API, long=Active Pharmaceutical Ingredient \DeclareAcronymCOSMO-RSshort=COSMO-RS, long=Conductor-like screening model for realistic solvation \DeclareAcronymOFshort=OF, long=Objective Function

\affiliation

[inst1]organization=Institute of Thermal Separation Processes, Hamburg University of Technology, city=Hamburg, postcode=21073, country=Germany

\affiliation

[inst2]organization=Institute of Technical Thermodynamics, RWTH Aachen University, city=Aachen, postcode=52062, country=Germany

\affiliation

[inst3]organization=FAccTs GmbH, city=Cologne, postcode=50677, country=Germany

1 Introduction

The accurate prediction of solvation free energies of solutes ΔGsolvΔsubscript𝐺solv\Delta G_{\mathrm{solv}}roman_Δ italic_G start_POSTSUBSCRIPT roman_solv end_POSTSUBSCRIPT is crucial to understand phenomena occurring in the liquid phase. From this quantity, one can determine varius thermodynamic and kinetic properties such as reaction rates, equilibrium constants, activity coefficients, dissociation acidity constants, partition coefficients. Consequently, solvation free energy plays a pivotal role in chemical reactions[1, 2, 3, 4, 5] and design of materials with novel properties[6, 7, 8, 9, 10, 11]. Despite significant efforts over recent decades, precise prediction of ΔGsolvΔsubscript𝐺solv\Delta G_{\mathrm{solv}}roman_Δ italic_G start_POSTSUBSCRIPT roman_solv end_POSTSUBSCRIPT remains a challenge.

In the last decade, various explicit[12, 13], implicit[14, 15, 16, 17, 18, 19], and data-driven[20, 21, 22, 23, 24, 25, 26] approaches have been used for solution phase property prediction. Explicit approaches such as \acMD are less common methods as they are quite computational time expensive as one needs to dissolve a solute in thousands of solvent molecules. Implicit approaches are more common in solution phase property prediction since they are less computational demanding. These approaches accurately predict the solvation free energies of neutral solutes with an uncertainty ranging from 0.4 kcal mol1 to 1.1 kcal mol1rangetimes0.4timeskilocalmole1times1.1timeskilocalmole10.4\text{\,}\mathrm{kcal}\text{\,}{\mathrm{mol}}^{-1}1.1\text{\,}\mathrm{kcal}% \text{\,}{\mathrm{mol}}^{-1}start_ARG start_ARG 0.4 end_ARG start_ARG times end_ARG start_ARG start_ARG roman_kcal end_ARG start_ARG times end_ARG start_ARG power start_ARG roman_mol end_ARG start_ARG - 1 end_ARG end_ARG end_ARG end_ARG to start_ARG start_ARG 1.1 end_ARG start_ARG times end_ARG start_ARG start_ARG roman_kcal end_ARG start_ARG times end_ARG start_ARG power start_ARG roman_mol end_ARG start_ARG - 1 end_ARG end_ARG end_ARG end_ARG[27, 15, 28, 29, 30]. Among others, the \acCOSMO-RS is a frequently used fully predictive implicit model with an uncertainty of 0.40 kcal mol1 to 0.45 kcal mol1rangetimes0.40timeskilocalmole1times0.45timeskilocalmole10.40\text{\,}\mathrm{kcal}\text{\,}{\mathrm{mol}}^{-1}0.45\text{\,}\mathrm{% kcal}\text{\,}{\mathrm{mol}}^{-1}start_ARG start_ARG 0.40 end_ARG start_ARG times end_ARG start_ARG start_ARG roman_kcal end_ARG start_ARG times end_ARG start_ARG power start_ARG roman_mol end_ARG start_ARG - 1 end_ARG end_ARG end_ARG end_ARG to start_ARG start_ARG 0.45 end_ARG start_ARG times end_ARG start_ARG start_ARG roman_kcal end_ARG start_ARG times end_ARG start_ARG power start_ARG roman_mol end_ARG start_ARG - 1 end_ARG end_ARG end_ARG end_ARG for predicting the solvation free energy of neutral solutes[27, 31, 32]. The basic principle of \acCOSMO-RS is based on the approximation of molecular interactions by the interactions of surface segments from the molecular cavities. This makes the calculations less demanding than \acMD calculations as the required input information only needs to be calculated once for each molecule from \acQM. Data-driven models have shown great potential in liquid phase property prediction mainly because many well established experimental databases are available. For example, \acML methods have been quite promising in predicting the solvation free energies of neutral solutes[33, 34, 35]. Vermeire et al. [32], trained and fine tuned a \acGNN model for predicting solvation free energies of neutral molecules reporting an uncertainty of 0.24 kcal mol1times0.24timeskilocalmole10.24\text{\,}\mathrm{kcal}\text{\,}{\mathrm{mol}}^{-1}start_ARG 0.24 end_ARG start_ARG times end_ARG start_ARG start_ARG roman_kcal end_ARG start_ARG times end_ARG start_ARG power start_ARG roman_mol end_ARG start_ARG - 1 end_ARG end_ARG end_ARG. While these \acGNN perform well on training datasets, their ability to generalize to new structures with different atoms and functional groups remains challenging. Empirical evidence suggests that their embeddings can generalize across different molecular spaces, but achieving robust out-of-distribution performance is still difficult[36, 37].

Recently, we published an open source version of the \acCOSMO-RS model, which we will call openCOSMO-RS 22[38] in the following. This implementation of \acCOSMO-RS is the first open source version introducing additional descriptors besides the screening charge density. Having additional descriptors allowed the model to be modified for electrolytes with great success in the past[39, 40, 41, 42, 43, 44]. For neutral molecules, openCOSMO-RS 22 performs quite well for predicting the \acIDAC with a \acRMSD of 0.76 times0.76absent0.76\text{\,}start_ARG 0.76 end_ARG start_ARG times end_ARG start_ARG end_ARG based on TURBOMOLE 6.6 parameterization and 0.65 times0.65absent0.65\text{\,}start_ARG 0.65 end_ARG start_ARG times end_ARG start_ARG end_ARG based on ORCA 5.0.3[45, 46, 47, 48]. Although openCOSMO-RS 22 was able to predict equilibrium properties between two or more solvents, it was not capable of predicting properties between gas and liquid phase such as the solvation free energy.

In this study, we perform a new parameterization of the model based on quantum chemical calculations from the software ORCA 6.0. This will be called openCOSMO-RS 24a. To do so, initially, we compile experimental data on solvation free energies, partition coefficients, and activity coefficients for a representative range of solutes and solvents. Since the molar volume of the solvent is required to predict the solvation free energy, we develop a \acQSPR model to predict the molar volume of the solvent at 25 °Ctimes25degreeCelsius25\text{\,}\mathrm{\SIUnitSymbolCelsius}start_ARG 25 end_ARG start_ARG times end_ARG start_ARG °C end_ARG based on experimental data available in the literature[49]. We modified the openCOSMO-RS conformer workflow compared to that of our previous work[38] by adding quantum chemical calculations of gas phase energies as these are needed to calculate the solvation free energies. Leveraging the experimental data together with the \acQSPR model, we parametrize openCOSMO-RS 24a for predicting the solvation free energies for a wide range of solutes and solvents. During this work, we found that the Gaussian charge scheme, used within CPCM[50, 51] in ORCA[52] produced very small segments leading to unusually large screening charge densities. This was addressed by rejecting the addition of segments smaller than a specified threshold (0.01 Å2times0.01angstrom20.01\text{\,}{\mathrm{\text{Å}}}^{2}start_ARG 0.01 end_ARG start_ARG times end_ARG start_ARG power start_ARG angstrom end_ARG start_ARG 2 end_ARG end_ARG) with minimum effect on the calculated energies. Furthermore, in ORCA, a Lagrangian-based algorithm is used to calculate the outlying charge correction[53]. Although this is not a treatment as advanced as the method proposed by Klamt [54] as it neglects the spacial distribution of the outlying charge, it should be enough for the neutral molecules tested. In the future a more thorough analysis of this is planned as it becomes especially important for anions. Finally, we report the performance of openCOSMO-RS 24a, which as of now can directly be used within the ORCA 6.0 software as additional solvation model enabling the user to access a variety of liquid phase properties which previously was not possible.

2 Methods

2.1 openCOSMO-RS

The theory of openCOSMO-RS has been discussed in previous studies[55, 38] and only the equation related to solvation free energy is briefly summarized here. The solvation free energy can then be calculated similarly to Klamt et al. [56, 17] from

ΔGsolv=Ediel+RTlnγαταAαωringnringRTlnνIGνliquidηΔsubscript𝐺solvsubscript𝐸diel𝑅𝑇superscript𝛾subscript𝛼subscript𝜏𝛼subscript𝐴𝛼subscript𝜔ringsubscript𝑛ring𝑅𝑇subscript𝜈IGsubscript𝜈liquid𝜂\displaystyle\Delta G_{\mathrm{solv}}=E_{\mathrm{diel}}+RT\ln\gamma^{\infty}-% \sum\limits_{\mathrm{\alpha}}{\tau_{\mathrm{\alpha}}A_{\mathrm{\alpha}}}-% \omega_{\mathrm{ring}}n_{\mathrm{ring}}-RT\ln\frac{\nu_{\mathrm{IG}}}{\nu_{% \mathrm{liquid}}}-\etaroman_Δ italic_G start_POSTSUBSCRIPT roman_solv end_POSTSUBSCRIPT = italic_E start_POSTSUBSCRIPT roman_diel end_POSTSUBSCRIPT + italic_R italic_T roman_ln italic_γ start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT - ∑ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT - italic_ω start_POSTSUBSCRIPT roman_ring end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT roman_ring end_POSTSUBSCRIPT - italic_R italic_T roman_ln divide start_ARG italic_ν start_POSTSUBSCRIPT roman_IG end_POSTSUBSCRIPT end_ARG start_ARG italic_ν start_POSTSUBSCRIPT roman_liquid end_POSTSUBSCRIPT end_ARG - italic_η (1)

The term Edielsubscript𝐸dielE_{\mathrm{diel}}italic_E start_POSTSUBSCRIPT roman_diel end_POSTSUBSCRIPT represents the dielectric energy, which is the energy involved in transferring the solute from the gas phase to an ideal conductor. The second term refers to the chemical potential at infinite dilution in the liquid phase, using the ideal conductor as the reference state and it is directly predicted by openCOSMO-RS. The third term encompasses the energy required for cavity formation and includes a van der Waals-like contribution to the solvation free energy, calculated by summing the product of each atom’s area Aαsubscript𝐴𝛼A_{\mathrm{\alpha}}italic_A start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT on the solute molecule and a factor ταsubscript𝜏𝛼\tau_{\mathrm{\alpha}}italic_τ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT that depends on the atomic number. The fourth term provides a correction for molecules containing rings, determined by multiplying a general parameter ωringsubscript𝜔ring\omega_{\mathrm{ring}}italic_ω start_POSTSUBSCRIPT roman_ring end_POSTSUBSCRIPT by the number of rings nringsubscript𝑛ringn_{\mathrm{ring}}italic_n start_POSTSUBSCRIPT roman_ring end_POSTSUBSCRIPT in the solute structure. The fifth term accounts for the change in units of the reference states from mole fraction (units of the calculation) to molar concentration (units of the experimental data) with νIGsubscript𝜈IG\nu_{\mathrm{IG}}italic_ν start_POSTSUBSCRIPT roman_IG end_POSTSUBSCRIPT and νliquidsubscript𝜈liquid\nu_{\mathrm{liquid}}italic_ν start_POSTSUBSCRIPT roman_liquid end_POSTSUBSCRIPT representing the molar volumes of the ideal gas and the liquid phase, respectively. The final term ηtypesubscript𝜂type\eta_{\mathrm{type}}italic_η start_POSTSUBSCRIPT roman_type end_POSTSUBSCRIPT is an adjustable parameter.

2.2 Computational details

In the following, we describe the openCOSMO-RS 24a conformer workflow for searching and calculating all necessary input data for the gas and \acCPCM phase. This is the updated overview of the pipeline also available on github[57]:

  • 1.

    Gas phase calculations

    • --

      Molecular mechanics-based conformer generation using RDKit[58, 59].

    • --

      Filter conformers by an energy window of 6 kcal mol1times6timeskilocalmole16\text{\,}\mathrm{kcal}\text{\,}{\mathrm{mol}}^{-1}start_ARG 6 end_ARG start_ARG times end_ARG start_ARG start_ARG roman_kcal end_ARG start_ARG times end_ARG start_ARG power start_ARG roman_mol end_ARG start_ARG - 1 end_ARG end_ARG end_ARG.

    • --

      Cluster conformers by an \acRMSD window of 1 atimes1a1\text{\,}\mathrm{a}start_ARG 1 end_ARG start_ARG times end_ARG start_ARG roman_a end_ARGnd save these for \acCPCM[52] calculations.

    • --

      Geometry optimizations at DFT/BP86/def2-TZVP(-f)[60, 61, 62] level using ORCA.

    • --

      Single point energy calculation using DFT/BP86/def2-TZVPD level in ORCA for the conformer with the lowest energy.

  • 2.
    \ac

    CPCM calculations

    • --

      Optimize geometries in water using ALPB [63] with GFN2-xTB [64] calculations from within ORCA, starting from saved conformers.

    • --

      Filter confomers by an energy window of 6 kcal mol1times6timeskilocalmole16\text{\,}\mathrm{kcal}\text{\,}{\mathrm{mol}}^{-1}start_ARG 6 end_ARG start_ARG times end_ARG start_ARG start_ARG roman_kcal end_ARG start_ARG times end_ARG start_ARG power start_ARG roman_mol end_ARG start_ARG - 1 end_ARG end_ARG end_ARG.

    • --

      Cluster conformers by an \acRMSD window of 1 atimes1a1\text{\,}\mathrm{a}start_ARG 1 end_ARG start_ARG times end_ARG start_ARG roman_a end_ARGnd select the conformers with the three lowest energies.

    • --
      \ac

      CPCM geometry optimizations at the DFT/BP86/def2-TZVP(-f) level in ORCA.

    • --

      Filter confomers by an energy window of 6 kcal mol1times6timeskilocalmole16\text{\,}\mathrm{kcal}\text{\,}{\mathrm{mol}}^{-1}start_ARG 6 end_ARG start_ARG times end_ARG start_ARG start_ARG roman_kcal end_ARG start_ARG times end_ARG start_ARG power start_ARG roman_mol end_ARG start_ARG - 1 end_ARG end_ARG end_ARG.

    • --

      Cluster conformers by an \acRMSD window of 1 atimes1a1\text{\,}\mathrm{a}start_ARG 1 end_ARG start_ARG times end_ARG start_ARG roman_a end_ARGnd select the conformer with the lowest energy.

    • --
      \ac

      CPCM geometry optimizations of DFT/BP86/def2-TZVP level in ORCA.

    • --
      \ac

      CPCM single point energy calculation using DFT/BP86/def2-TZVPD level in ORCA.

To search a large parameter space, the global solver differential evolution as implemented in SciPy[65] is used. Similar to our previous studies[55, 38], the following objective function is minimized for all optimizations:

OF=1Npi(YcalcYexp)2OF1subscript𝑁𝑝subscript𝑖superscriptsuperscript𝑌calcsuperscript𝑌exp2\mathrm{OF}=\frac{1}{N_{p}}\sum_{i}{\left(Y^{\mathrm{calc}}-Y^{\mathrm{exp}}% \right)^{2}}roman_OF = divide start_ARG 1 end_ARG start_ARG italic_N start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_Y start_POSTSUPERSCRIPT roman_calc end_POSTSUPERSCRIPT - italic_Y start_POSTSUPERSCRIPT roman_exp end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (2)

The average absolute deviation is calculated from:

AADY=1Npi|YcalcYexp|subscriptAADY1subscript𝑁psubscript𝑖superscript𝑌calcsuperscript𝑌exp\mathrm{AAD}_{\mathrm{Y}}=\frac{1}{N_{\mathrm{p}}}\sum_{i}{\left|Y^{\mathrm{% calc}}-Y^{\mathrm{exp}}\right|}roman_AAD start_POSTSUBSCRIPT roman_Y end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_N start_POSTSUBSCRIPT roman_p end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_Y start_POSTSUPERSCRIPT roman_calc end_POSTSUPERSCRIPT - italic_Y start_POSTSUPERSCRIPT roman_exp end_POSTSUPERSCRIPT | (3)

whereby YY\mathrm{Y}roman_Y is either lnγisuperscriptsubscript𝛾𝑖\ln{\gamma_{i}^{\infty}}roman_ln italic_γ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT, lnK𝐾\ln{K}roman_ln italic_K or ΔGsolvΔsubscript𝐺solv\Delta G_{\mathrm{solv}}roman_Δ italic_G start_POSTSUBSCRIPT roman_solv end_POSTSUBSCRIPT.

2.3 Dataset overview

The dataset used in this work is comprised of three data types at 25 °Ctimes25degreeCelsius25\text{\,}\mathrm{\SIUnitSymbolCelsius}start_ARG 25 end_ARG start_ARG times end_ARG start_ARG °C end_ARG: (i) infinite dilution activity coefficients, (ii) partition coefficients and (iii) solvation free energies. The 800+ infinite dilution activity coefficients are taken from Parcher et al. [66], Voutsas & Tassios[67], Kontogeorgis et al. [68], Kato et al. [69] and He & Zhong[70]. The partition coefficients for the following solvent combinations: octanol + water, benzene + water, hexane + water, and diethyl ether + water are collected by Klamt et al. [17]. The 2000+ solvation free energies are taken from Marenich et al. [71]. Xylene is excluded from the calculations as it is a mixture of constitutional isomers. Additionally, values for water as a solute in all three data types were excluded due to their known prediction issues when solvated in non-polar solvents within the \acCOSMO-RS framework without further model improvements[17, 72]. Even for molecular simulations treating mixtures of water and alkanes over the complete concentration range is challenging for most models. [73, 74, 75]

To calculate the solvation free energy, the molar volume of the pure solvent is required (see Equation 1). Thus, we develop a \acQSPR model in this study to predict the molar volume of the solvent at 25 °Ctimes25degreeCelsius25\text{\,}\mathrm{\SIUnitSymbolCelsius}start_ARG 25 end_ARG start_ARG times end_ARG start_ARG °C end_ARG based on experimental data available from Mathieu & Bouteloup[49]. The complete dataset is cleaned and normalized: isotopes and explicit hydrogens are deleted, duplicates are merged, and only the first value in the original data for each component is retained.

3 Results and Discussion

3.1 Predictive QSPR model for molar volumes of the solvent at 25 °Ctimes25degreeCelsius25\text{\,}\mathrm{\SIUnitSymbolCelsius}start_ARG 25 end_ARG start_ARG times end_ARG start_ARG °C end_ARG

To enable the fully predictive calculation of solvation energies, a model is developed to predict the only quantity not calculated within openCOSMO-RS; the molar volume of the pure solvent. The model is based on a linear combination of descriptors, represented by the following equation:

vpure=0.6977ACPCM0.3161M2+0.03244M4subscript𝑣𝑝𝑢𝑟𝑒0.6977subscript𝐴CPCM0.3161subscript𝑀20.03244subscript𝑀4\displaystyle v_{pure}=0.6977A_{\mathrm{CPCM}}-0.3161M_{2}+0.03244M_{4}italic_v start_POSTSUBSCRIPT italic_p italic_u italic_r italic_e end_POSTSUBSCRIPT = 0.6977 italic_A start_POSTSUBSCRIPT roman_CPCM end_POSTSUBSCRIPT - 0.3161 italic_M start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + 0.03244 italic_M start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT (4)
+0.9431natoms+8.113nSi,atoms0.070670.9431subscript𝑛atoms8.113subscript𝑛Siatoms0.07067\displaystyle+0.9431n_{\mathrm{atoms}}+8.113n_{\mathrm{Si,atoms}}-0.07067+ 0.9431 italic_n start_POSTSUBSCRIPT roman_atoms end_POSTSUBSCRIPT + 8.113 italic_n start_POSTSUBSCRIPT roman_Si , roman_atoms end_POSTSUBSCRIPT - 0.07067

where A\acCPCMsubscript𝐴\ac𝐶𝑃𝐶𝑀A_{\ac{CPCM}}italic_A start_POSTSUBSCRIPT italic_C italic_P italic_C italic_M end_POSTSUBSCRIPT is the area of the surface segments on the cavity of the solute, Misubscript𝑀iM_{\mathrm{i}}italic_M start_POSTSUBSCRIPT roman_i end_POSTSUBSCRIPT are the respective sigma moments, natomssubscript𝑛atomsn_{\mathrm{atoms}}italic_n start_POSTSUBSCRIPT roman_atoms end_POSTSUBSCRIPT is the number of atoms and nSi,atomssubscript𝑛Siatomsn_{\mathrm{Si,atoms}}italic_n start_POSTSUBSCRIPT roman_Si , roman_atoms end_POSTSUBSCRIPT is the number of silicon atoms in the molecule. All descriptor combinations were systematically evaluated. Notably, A\acCPCMsubscript𝐴\ac𝐶𝑃𝐶𝑀A_{\ac{CPCM}}italic_A start_POSTSUBSCRIPT italic_C italic_P italic_C italic_M end_POSTSUBSCRIPT offers a more effective representation of the molar volume of the solvent compared to the volume of the cavity while utilizing fewer descriptors. The significant effect of the number of silicon atoms on the model’s accuracy might suggest that the silicon radius might not be optimal. Figure 1 shows the predicted molar volumes of the solvents using the \acQSPR molar volume of the solvent model against the experimental molar volumes of the solvents at 25 °Ctimes25degreeCelsius25\text{\,}\mathrm{\SIUnitSymbolCelsius}start_ARG 25 end_ARG start_ARG times end_ARG start_ARG °C end_ARG based on experimental data described more in detail in the previous section. Overall, the predicted molar volumes of the solvents agree well with the experimental ones with \acAAD of 3.48 cm3 mol1times3.48timescentimeter3mole13.48\text{\,}{\mathrm{cm}}^{3}\text{\,}{\mathrm{mol}}^{-1}start_ARG 3.48 end_ARG start_ARG times end_ARG start_ARG start_ARG power start_ARG roman_cm end_ARG start_ARG 3 end_ARG end_ARG start_ARG times end_ARG start_ARG power start_ARG roman_mol end_ARG start_ARG - 1 end_ARG end_ARG end_ARG and R2 of 0.995. Mathieu and Bouteloup report a model for predicting the standard density with an average relative error <<<1.7%. For density prediction, our \acQSPR model achieves a relative error of 2.2%, which is an accuracy similar to that of other group contribution methods[76, 77, 78].

Refer to caption
Figure 1: Parity plot for the prediction of molar volume of the solvent model.

3.2 Parametrization

All non-fixed parameters in Table 1 are simultaneously adjusted using the differential evolution algorithm implemented in SciPy[65]. Following the approach used in openCOSMO-RS 22[38], we incorporate the improved misfit term, which includes the additional descriptor σsuperscript𝜎perpendicular-to\sigma^{\perp}italic_σ start_POSTSUPERSCRIPT ⟂ end_POSTSUPERSCRIPT to recover some of the lost 3D information. All data are included in the regression of the parameters.

Table 1: Parameterization of openCOSMO-RS 24a based with gas and \acCPCM geometry optimizations at DFT/BP86/def2-TZVP level and gas and \acCPCM single point calculations at DFT/BP86/def2-TZVPD level in ORCA 6.0. [*] denotes the parameter was fixed.
Parameter Value Parameter Value
rav[Å]superscriptsubscript𝑟𝑎𝑣delimited-[]italic-År_{av}^{*}\;[\AA]italic_r start_POSTSUBSCRIPT italic_a italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT [ italic_Å ] 0.5 τ1[kJmolÅ2]subscript𝜏1delimited-[]𝑘𝐽𝑚𝑜𝑙superscriptitalic-Å2\tau_{1}\left[\frac{kJ}{mol\cdot\AA^{2}}\right]italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT [ divide start_ARG italic_k italic_J end_ARG start_ARG italic_m italic_o italic_l ⋅ italic_Å start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ] 0.123
aeff[Å2]subscript𝑎𝑒𝑓𝑓delimited-[]superscriptitalic-Å2a_{eff}\;[\AA^{2}]italic_a start_POSTSUBSCRIPT italic_e italic_f italic_f end_POSTSUBSCRIPT [ italic_Å start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] 5.925 τ6[kJmolÅ2]subscript𝜏6delimited-[]𝑘𝐽𝑚𝑜𝑙superscriptitalic-Å2\tau_{6}\left[\frac{kJ}{mol\cdot\AA^{2}}\right]italic_τ start_POSTSUBSCRIPT 6 end_POSTSUBSCRIPT [ divide start_ARG italic_k italic_J end_ARG start_ARG italic_m italic_o italic_l ⋅ italic_Å start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ] 0.096
αmf[kJÅ2mole2]subscript𝛼𝑚𝑓delimited-[]𝑘𝐽superscriptitalic-Å2𝑚𝑜𝑙superscript𝑒2\alpha_{mf}\left[\frac{kJ\cdot\AA^{2}}{mol\cdot e^{2}}\right]italic_α start_POSTSUBSCRIPT italic_m italic_f end_POSTSUBSCRIPT [ divide start_ARG italic_k italic_J ⋅ italic_Å start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_m italic_o italic_l ⋅ italic_e start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ] 7281 τ7[kJmolÅ2]subscript𝜏7delimited-[]𝑘𝐽𝑚𝑜𝑙superscriptitalic-Å2\tau_{7}\left[\frac{kJ}{mol\cdot\AA^{2}}\right]italic_τ start_POSTSUBSCRIPT 7 end_POSTSUBSCRIPT [ divide start_ARG italic_k italic_J end_ARG start_ARG italic_m italic_o italic_l ⋅ italic_Å start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ] 0.003
fcorr[]superscriptsubscript𝑓𝑐𝑜𝑟𝑟delimited-[]f_{corr}^{*}[-]italic_f start_POSTSUBSCRIPT italic_c italic_o italic_r italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT [ - ] 2.4 τ8[kJmolÅ2]subscript𝜏8delimited-[]𝑘𝐽𝑚𝑜𝑙superscriptitalic-Å2\tau_{8}\left[\frac{kJ}{mol\cdot\AA^{2}}\right]italic_τ start_POSTSUBSCRIPT 8 end_POSTSUBSCRIPT [ divide start_ARG italic_k italic_J end_ARG start_ARG italic_m italic_o italic_l ⋅ italic_Å start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ] 0.015
chb[kJÅ2mole2]subscript𝑐𝑏delimited-[]𝑘𝐽superscriptitalic-Å2𝑚𝑜𝑙superscript𝑒2c_{hb}\left[\frac{kJ\cdot\AA^{2}}{mol\cdot e^{2}}\right]italic_c start_POSTSUBSCRIPT italic_h italic_b end_POSTSUBSCRIPT [ divide start_ARG italic_k italic_J ⋅ italic_Å start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_m italic_o italic_l ⋅ italic_e start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ] 43327 τ9[kJmolÅ2]subscript𝜏9delimited-[]𝑘𝐽𝑚𝑜𝑙superscriptitalic-Å2\tau_{9}\left[\frac{kJ}{mol\cdot\AA^{2}}\right]italic_τ start_POSTSUBSCRIPT 9 end_POSTSUBSCRIPT [ divide start_ARG italic_k italic_J end_ARG start_ARG italic_m italic_o italic_l ⋅ italic_Å start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ] 0.023
σhb[e/Å2]subscript𝜎𝑏delimited-[]𝑒superscriptitalic-Å2\sigma_{hb}\;[e/\AA^{2}]italic_σ start_POSTSUBSCRIPT italic_h italic_b end_POSTSUBSCRIPT [ italic_e / italic_Å start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] 0.00961 τ17[kJmolÅ2]subscript𝜏17delimited-[]𝑘𝐽𝑚𝑜𝑙superscriptitalic-Å2\tau_{17}\left[\frac{kJ}{mol\cdot\AA^{2}}\right]italic_τ start_POSTSUBSCRIPT 17 end_POSTSUBSCRIPT [ divide start_ARG italic_k italic_J end_ARG start_ARG italic_m italic_o italic_l ⋅ italic_Å start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ] 0.143
Astd[Å2]subscript𝐴𝑠𝑡𝑑delimited-[]superscriptitalic-Å2A_{std}\;[\AA^{2}]italic_A start_POSTSUBSCRIPT italic_s italic_t italic_d end_POSTSUBSCRIPT [ italic_Å start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] 41.624 τ53[kJmolÅ2]subscript𝜏53delimited-[]𝑘𝐽𝑚𝑜𝑙superscriptitalic-Å2\tau_{53}\left[\frac{kJ}{mol\cdot\AA^{2}}\right]italic_τ start_POSTSUBSCRIPT 53 end_POSTSUBSCRIPT [ divide start_ARG italic_k italic_J end_ARG start_ARG italic_m italic_o italic_l ⋅ italic_Å start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ] 0.891
η[kJmol]𝜂delimited-[]𝑘𝐽𝑚𝑜𝑙\eta\left[\frac{kJ}{mol}\right]italic_η [ divide start_ARG italic_k italic_J end_ARG start_ARG italic_m italic_o italic_l end_ARG ] -18.61 τ14[kJmolÅ2]subscript𝜏14delimited-[]𝑘𝐽𝑚𝑜𝑙superscriptitalic-Å2\tau_{14}\left[\frac{kJ}{mol\cdot\AA^{2}}\right]italic_τ start_POSTSUBSCRIPT 14 end_POSTSUBSCRIPT [ divide start_ARG italic_k italic_J end_ARG start_ARG italic_m italic_o italic_l ⋅ italic_Å start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ] 0.018
ωring[kJmol]subscript𝜔𝑟𝑖𝑛𝑔delimited-[]𝑘𝐽𝑚𝑜𝑙\omega_{ring}\left[\frac{kJ}{mol}\right]italic_ω start_POSTSUBSCRIPT italic_r italic_i italic_n italic_g end_POSTSUBSCRIPT [ divide start_ARG italic_k italic_J end_ARG start_ARG italic_m italic_o italic_l end_ARG ] 1.100 τ15[kJmolÅ2]subscript𝜏15delimited-[]𝑘𝐽𝑚𝑜𝑙superscriptitalic-Å2\tau_{15}\left[\frac{kJ}{mol\cdot\AA^{2}}\right]italic_τ start_POSTSUBSCRIPT 15 end_POSTSUBSCRIPT [ divide start_ARG italic_k italic_J end_ARG start_ARG italic_m italic_o italic_l ⋅ italic_Å start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ] 0.015
τ16[kJmolÅ2]subscript𝜏16delimited-[]𝑘𝐽𝑚𝑜𝑙superscriptitalic-Å2\tau_{16}\left[\frac{kJ}{mol\cdot\AA^{2}}\right]italic_τ start_POSTSUBSCRIPT 16 end_POSTSUBSCRIPT [ divide start_ARG italic_k italic_J end_ARG start_ARG italic_m italic_o italic_l ⋅ italic_Å start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ] 0.146

3.3 Model performance

In this work, infinite dilution activity coefficients, partition coefficients, and solvation free energies, all at 25 °Ctimes25degreeCelsius25\text{\,}\mathrm{\SIUnitSymbolCelsius}start_ARG 25 end_ARG start_ARG times end_ARG start_ARG °C end_ARG are used to parameterize openCOSMO-RS 24a. Table 2 provides an overview of all calculations for openCOSMO-RS 24a, COSMOtherm 24 TZVP and COSMOtherm 24 FINE, calculated with the lowest energy conformer. Figures 2, 3, and 4 show the predicted values obtained from openCOSMO-RS 24a against the experimental values compiled from the literature for the activity coefficients, solvation free energies, and partition coefficients, respectively. In all Figures, black represents solutes without hydrogen bonds, red represents solutes that are hydrogen bond donors, and blue represents solutes that are hydrogen bond acceptors. Whether or not a solute is considered hydrogen bonding depends on the existence of area having a screening charge density larger than the threshold hydrogen bonding parameter σHBsubscript𝜎HB\sigma_{\mathrm{HB}}italic_σ start_POSTSUBSCRIPT roman_HB end_POSTSUBSCRIPT.

For the infinite dilution activity coefficients, a total of 882 data points are used (see Figure 2). openCOSMO-RS 24a achieves an \acAAD of 0.51 times0.51absent0.51\text{\,}start_ARG 0.51 end_ARG start_ARG times end_ARG start_ARG end_ARG and R2 of 0.98 times0.98absent0.98\text{\,}start_ARG 0.98 end_ARG start_ARG times end_ARG start_ARG end_ARG, showing an improvement compared to our previous work openCOSMO-RS 22[38], which had an \acAAD of 0.65 times0.65absent0.65\text{\,}start_ARG 0.65 end_ARG start_ARG times end_ARG start_ARG end_ARG Overall, it can be observed, that overall, for the less polar solutes, the infinite dilution activity coefficients are slightly underestimated while for the more polar ones are somewhat overestimated. The infinite dilution activity coefficient represents the energy required for transferring one molecule from pure component to being infinitely dilute in a second solvent. Hence, this systematic shift in model performance based on solute polarity might be either due to the overestimation of attractive hydrogen bonding for more polar molecules in the reference state (i.e. pure solute) or due to an overestimation of the repulsive misfit energy at infinite dilution in the solvent. This will be investigated further in future work.

Refer to caption
Figure 2: Parity plot for infinite dilution activity coefficients calculated with openCOSMO-RS 24a. Colors represent different solute types: (\bullet) non-HB, (\bullet) HB acceptors, (\bullet) HB donors and (\bullet) HB donors/acceptors.

For the partition coefficients, the dataset includes 296 data points (see Figure 3). The openCOSMO-RS 24a models achieves an \acAAD of 0.76 times0.76absent0.76\text{\,}start_ARG 0.76 end_ARG start_ARG times end_ARG start_ARG end_ARG and R2 of 0.92 times0.92absent0.92\text{\,}start_ARG 0.92 end_ARG start_ARG times end_ARG start_ARG end_ARG, indicitating good agreement between the calculated and the experimental data. Similar to \acIDAC, the model tends to overestimate partition coefficients for more polar solutes and slightly underestimate them for less polar ones. The partition coefficient measures the energy required to transfer a solute at infinite dilution from water to another solvent, representing the relative interaction energy of the solute with the other solvent compared to water. The data suggests that the greater the polarity difference is between the solute and the other solvent, the larger the deviation in calculated values by openCOSMO-RS 24a. As mentioned earlier, systems with water as a solute are excluded from the dataset because the usual COSMO-RS theory struggles to handle water at infinite dilution in very non-polar solvents [17, 72]. This issue appears to extend to other polar molecules in non-polar solvents, though less pronounced than with water. This suggests a general systematic issue that could potentially be addressed to improve the model.

Refer to caption
Figure 3: Parity plot for partition coefficients at 25 °Ctimes25degreeCelsius25\text{\,}\mathrm{\SIUnitSymbolCelsius}start_ARG 25 end_ARG start_ARG times end_ARG start_ARG °C end_ARG calculated with openCOSMO-RS 24a whereby AB = [octanol/water, benzene/water, hexane/water, diethyl ether/water]. Colors represent different solute types: (\bullet) non-HB, (\bullet) HB acceptors, (\bullet) HB donors and (\bullet) HB donors/acceptors.

For the solvation free energies, the dataset contains 2129 data points (see Figure 4). The openCOSMO-RS 24a model achieves an \acAAD of 0.45 kcal mol1times0.45timeskilocalmole10.45\text{\,}\mathrm{kcal}\text{\,}{\mathrm{mol}}^{-1}start_ARG 0.45 end_ARG start_ARG times end_ARG start_ARG start_ARG roman_kcal end_ARG start_ARG times end_ARG start_ARG power start_ARG roman_mol end_ARG start_ARG - 1 end_ARG end_ARG end_ARG and R2 of 0.91 times0.91absent0.91\text{\,}start_ARG 0.91 end_ARG start_ARG times end_ARG start_ARG end_ARG, showing a strong agreement between the calculated and experimental values, which is impressive considering that the molecules in this study are represented by only a single conformer. In comparison, the commercial software COSMOtherm reports a similar uncertainty of 0.40 kcal mol1 to 0.45 kcal mol1rangetimes0.40timeskilocalmole1times0.45timeskilocalmole10.40\text{\,}\mathrm{kcal}\text{\,}{\mathrm{mol}}^{-1}0.45\text{\,}\mathrm{% kcal}\text{\,}{\mathrm{mol}}^{-1}start_ARG start_ARG 0.40 end_ARG start_ARG times end_ARG start_ARG start_ARG roman_kcal end_ARG start_ARG times end_ARG start_ARG power start_ARG roman_mol end_ARG start_ARG - 1 end_ARG end_ARG end_ARG end_ARG to start_ARG start_ARG 0.45 end_ARG start_ARG times end_ARG start_ARG start_ARG roman_kcal end_ARG start_ARG times end_ARG start_ARG power start_ARG roman_mol end_ARG start_ARG - 1 end_ARG end_ARG end_ARG end_ARG [27, 31, 32] for predicting the solvation free energy of neutral solutes, though it uses an ensemble of conformers. However, the comparison may not be entirely fair, as the parameters of openCOSMO-RS 24a are directly adjusted to the data used in this study, whereas the accuracy reported by was likely based on a larger dataset.

Refer to caption
Figure 4: Parity plot for solvation free energies at 25 °Ctimes25degreeCelsius25\text{\,}\mathrm{\SIUnitSymbolCelsius}start_ARG 25 end_ARG start_ARG times end_ARG start_ARG °C end_ARG calculated with openCOSMO-RS 24a. Colors represent different solute types: (\bullet) non-HB, (\bullet) HB acceptors, (\bullet) HB donors and (\bullet) HB donors/acceptors.

Table 2 presents a performance comparison of openCOSMO-RS 24a, COSMOtherm 24 TZVP, and COSMOtherm 24 FINE. All calculations are conducted using only the lowest energy conformer, providing a fair comparison since openCOSMO-RS 24a currently lacks the capability to integrate multiple conformers. Additional calculations using multiple conformers for TZVP and FINE parameterizations are included in the appendix. Nevertheless, for the dataset examined in this study, incorporating multiple conformers does not significantly impact the results. The results are categorized by solute type regarding its hydrogen binding capabilities. There is no universally accepted definition of a hydrogen-bonding molecule in the context of COSMO-RS modeling. However, we classify molecules based on the presence of areas in the σ𝜎\sigmaitalic_σ-profile with an absolute screening charge density larger than the σHBsubscript𝜎HB\sigma_{\mathrm{HB}}italic_σ start_POSTSUBSCRIPT roman_HB end_POSTSUBSCRIPT threshold. This method allows for a consistent categorization of molecule types, which directly relates to the terms in the interaction equations.

Table 2: Comparison of openCOSMO-RS 24a, COSMOtherm 24 TZVP, and COSMOtherm FINE for infinite dilution activity coefficients, partition coefficients, and solvation free energies. The calculations for all three models were performed only with the lowest energy conformer.
openCOSMO-RS 24a COSMOtherm 24 TZVP COSMOtherm 24 FINE
IDAC [-] Ndatapointssubscript𝑁datapointsN_{\mathrm{datapoints}}italic_N start_POSTSUBSCRIPT roman_datapoints end_POSTSUBSCRIPT AAD AAD AAD
non-HB 568 0.41 0.43 0.40
HB acceptor 172 0.63 0.53 0.35
HB donor 35 0.64 0.93 0.40
HB acceptor/donor 107 0.78 0.66 0.33
Total 882 0.51 0.50 0.38
Partition coefficients [-]
non-HB 68 0.36 0.54 0.46
HB acceptor 104 0.84 0.63 0.50
HB donor 12 0.25 0.28 0.23
HB acceptor/donor 112 0.99 0.76 0.59
Total 296 0.76 0.64 0.51
Solvation free energies [ kcal mol1timesabsenttimeskilocalmole1\text{\,}\mathrm{kcal}\text{\,}{\mathrm{mol}}^{-1}start_ARG end_ARG start_ARG times end_ARG start_ARG start_ARG roman_kcal end_ARG start_ARG times end_ARG start_ARG power start_ARG roman_mol end_ARG start_ARG - 1 end_ARG end_ARG end_ARG]
non-HB 434 0.36 0.34 0.32
HB acceptor 775 0.40 0.47 0.45
HB donor 69 0.52 0.39 0.32
HB acceptor/donor 851 0.54 0.53 0.40
Total 2129 0.45 0.46 0.40
Overall 3307 0.49 0.50 0.41

Overall, for \acIDAC and solvation free energies, in the observed dataset, openCOSMO-RS 24a performs comparable to COSMOtherm 24 TZVP. Only for partition coefficients COSMOtherm 24 TZVP is more accurate. The COSMOtherm 24 FINE model delivers more accurate results for the majority of the systems in the dataset. This is a first step to improving openCOSMO-RS in a more general fashion for neutral molecules with current work focusing on improvements like the combinatorial term, temperature dependency, dispersion interactions, multiple conformers and polarizability effects.

4 Conclusions

In this work, we extended openCOSMO-RS to be able to calculate solvation free energies. To do so, we developed a \acQSPR model to predict the molar volumes of the solvents required for calculating solvation free energies. By modifying the openCOSMO-RS conformer workflow and combining in total 3307 data points of activity coefficients, solvation free energies, and partition coefficients, we parameterized openCOSMO-RS 24a based on ORCA 6.0.

The openCOSMO-RS 24a parameterization based on ORCA 6.0 achieved an \acAAD of 0.45 kcal mol���1times0.45timeskilocalmole10.45\text{\,}\mathrm{kcal}\text{\,}{\mathrm{mol}}^{-1}start_ARG 0.45 end_ARG start_ARG times end_ARG start_ARG start_ARG roman_kcal end_ARG start_ARG times end_ARG start_ARG power start_ARG roman_mol end_ARG start_ARG - 1 end_ARG end_ARG end_ARG for predicting solvation free energies, which is comparable to the uncertainty of 0.45 kcal mol1times0.45timeskilocalmole10.45\text{\,}\mathrm{kcal}\text{\,}{\mathrm{mol}}^{-1}start_ARG 0.45 end_ARG start_ARG times end_ARG start_ARG start_ARG roman_kcal end_ARG start_ARG times end_ARG start_ARG power start_ARG roman_mol end_ARG start_ARG - 1 end_ARG end_ARG end_ARG that is reported by the commercial software COSMOtherm. For predicting the partition coefficients, openCOSMO-RS 24a achieves an \acAAD of 0.76 times0.76absent0.76\text{\,}start_ARG 0.76 end_ARG start_ARG times end_ARG start_ARG end_ARG. Furthermore, openCOSMO-RS 24a showed an improvement in predicting activity coefficients with an \acAAD of 0.51 times0.51absent0.51\text{\,}start_ARG 0.51 end_ARG start_ARG times end_ARG start_ARG end_ARG compared to an \acAAD of 0.65 times0.65absent0.65\text{\,}start_ARG 0.65 end_ARG start_ARG times end_ARG start_ARG end_ARG with the previous openCOSMO-RS 22 parameterization.

While the openCOSMO-RS 24a parameterization performs well when representing each molecule with a single conformer, future work will focus on extending the model to handle conformer ensembles. Additionally, we plan to extend openCOSMO-RS to ionic solutes, as current models often struggle to accurately predict the solvation free energies of ionic compounds[55, 79, 80].

The performance of openCOSMO-RS in predicting liquid phase properties, even with a single conformer, demonstrates that the inclusion of additional chemical descriptors to surface charge improves the model’s accuracy. Moreover, substantial efforts have been made to integrate openCOSMO-RS into ORCA 6.0, enabling users to directly access a variety of liquid phase properties; a capability that was previously unavailable. This integration marks a significant advancement, providing users with a powerful tool for comprehensive property prediction within a single software environment.

References

Appendix A Performance of COSMOtherm 24 TZVP and COSMOtherm 24 FINE using multiple conformers

Table 3: Performance of COSMOtherm 24 TZVP and COSMOtherm 24 FINE using multiple conformers for predicting infinite dilution activity coefficients, partition coefficients and solvation free energies.
COSMOtherm 24 TZVP MC COSMOtherm 24 FINE MC
IDAC [-] Count AAD AAD
non-HB 568 0.43 0.40
HB acceptor 172 0.54 0.36
HB donor 35 0.94 0.41
HB acceptor/donor 107 0.65 0.27
Total 882 0.50 0.38
Partition coefficients [-]
non-HB 68 0.56 0.46
HB acceptor 104 0.63 0.52
HB donor 12 0.28 0.22
HB acceptor/donor 112 0.78 0.59
Total 296 0.66 0.52
Solvation free energies [kcalmol1kcalsuperscriptmol1\mathrm{kcal\ mol^{-1}}roman_kcal roman_mol start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT]
non-HB 434 0.34 0.32
HB acceptor 775 0.46 0.45
HB donor 69 0.37 0.31
HB acceptor/donor 851 0.52 0.39
Total 2129 0.46 0.40
Overall 3307 0.51 0.42