-
Influence of Pseudo-Jahn-Teller Activity on the Singlet-Triplet Gap of Azaphenalenes
Authors:
Atreyee Majumdar,
Komal Jindal,
Surajit Das,
Raghunathan Ramakrishnan
Abstract:
We probe the sensitivity of the singlet-triplet energy gap of selected azaphenalenes to symmetry lowering induced by Jahn-Teller interactions. While cyclazine in its characteristic $D_{\rm 3h}$ structure defies Hund's rule, CCSD(T)-level modeling suggests its structure corresponds to two equivalent minima of $C_{\rm 3h}$ symmetry undergoing rapid automerization. The combined effect of symmetry red…
▽ More
We probe the sensitivity of the singlet-triplet energy gap of selected azaphenalenes to symmetry lowering induced by Jahn-Teller interactions. While cyclazine in its characteristic $D_{\rm 3h}$ structure defies Hund's rule, CCSD(T)-level modeling suggests its structure corresponds to two equivalent minima of $C_{\rm 3h}$ symmetry undergoing rapid automerization. The combined effect of symmetry reduction and high-level corrections indicates a negligible singlet-triplet gap in cyclazine. Notably, pentazine and heptazine prefer symmetric structures exhibiting negative gaps in accord with experiments. Azaphenalenes containing nitrogen atoms at electron-deficient sites exhibit stronger in-plane structural distortion; in their low-symmetry energy minima, they adhere to Hund's rule.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
Chemical Space-Informed Machine Learning Models for Rapid Predictions of X-ray Photoelectron Spectra of Organic Molecules
Authors:
Susmita Tripathy,
Surajit Das,
Shweta Jindal,
Raghunathan Ramakrishnan
Abstract:
We present machine learning models based on kernel-ridge regression for predicting X-ray photoelectron spectra of organic molecules originating from the $K$-shell ionization energies of carbon (C), nitrogen (N), oxygen (O), and fluorine (F) atoms. We constructed the training dataset through high-throughput calculations of $K$-shell core-electron binding energies (CEBEs) for 12,880 small organic mo…
▽ More
We present machine learning models based on kernel-ridge regression for predicting X-ray photoelectron spectra of organic molecules originating from the $K$-shell ionization energies of carbon (C), nitrogen (N), oxygen (O), and fluorine (F) atoms. We constructed the training dataset through high-throughput calculations of $K$-shell core-electron binding energies (CEBEs) for 12,880 small organic molecules in the bigQM7$ω$ dataset, employing the $Δ$-SCF formalism coupled with meta-GGA-DFT and a variationally converged basis set. The models are cost-effective, as they require the atomic coordinates of a molecule generated using universal force fields while estimating the target-level CEBEs corresponding to DFT-level equilibrium geometry. We explore transfer learning by utilizing the atomic environment feature vectors learned using a graph neural network framework in kernel-ridge regression. Additionally, we enhance accuracy within the $Δ$-machine learning framework by leveraging inexpensive baseline spectra derived from Kohn--Sham eigenvalues. When applied to 208 combinatorially substituted uracil molecules larger than those in the training set, our analyses suggest that the models may not provide quantitatively accurate predictions of CEBEs but offer a strong linear correlation relevant for virtual high-throughput screening. We present the dataset and models as the Python module, ${\tt cebeconf}$, to facilitate further explorations.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Resilience of Hund's rule in the Chemical Space of Small Organic Molecules
Authors:
Atreyee Majumdar,
Raghunathan Ramakrishnan
Abstract:
We embark on a quest to identify small molecules in the chemical space that can potentially violate Hund's rule. Utilizing twelve TDDFT approximations and the ADC(2) many-body method, we report the energies of S$_1$ and T$_1$ excited states of 12,880 closed-shell organic molecules within the bigQM7$ω$ dataset with up to 7 CONF atoms. In this comprehensive dataset, none of the molecules, in their m…
▽ More
We embark on a quest to identify small molecules in the chemical space that can potentially violate Hund's rule. Utilizing twelve TDDFT approximations and the ADC(2) many-body method, we report the energies of S$_1$ and T$_1$ excited states of 12,880 closed-shell organic molecules within the bigQM7$ω$ dataset with up to 7 CONF atoms. In this comprehensive dataset, none of the molecules, in their minimum energy geometry, exhibit a negative S$_1$-T$_1$ energy gap at the ADC($2$) level while several molecules display values $<0.1$ eV. The spin-component-scaled double-hybrid method, SCS-PBE-QIDH, demonstrates the best agreement with ADC(2). Yet, at this level, a few molecules with a strained $sp^3$-N center turn out as false-positives with the S$_1$ state lower in energy than T$_1$. We investigate a prototypical cage molecule with an energy gap $<-0.2$ eV, which a closer examination revealed as another false positive. We conclude that in the chemical space of small closed-shell organic molecules, it is possible to identify geometric and electronic structural features giving rise to S$_1$-T$_1$ degeneracy; still, there is no evidence of a negative gap. We share the dataset generated for this study as a module, to facilitate seamless molecular discovery through data mining.
△ Less
Submitted 3 May, 2024; v1 submitted 21 February, 2024;
originally announced February 2024.
-
Variational augmentation of Gaussian continuum basis sets for calculating atomic higher harmonic generation spectra
Authors:
Sai Vijay Bhaskar Mocherla,
Raghunathan Ramakrishnan
Abstract:
We present a variational augmentation procedure to optimize the exponents of Gaussian continuum basis sets for simulating strong-field laser ionization phenomena such as higher harmonic generation (HHG) in atoms and ions using the time-dependent configuration interaction (TDCI) method. We report the distribution of the optimized exponents and discuss how efficiently the resulting basis functions s…
▽ More
We present a variational augmentation procedure to optimize the exponents of Gaussian continuum basis sets for simulating strong-field laser ionization phenomena such as higher harmonic generation (HHG) in atoms and ions using the time-dependent configuration interaction (TDCI) method. We report the distribution of the optimized exponents and discuss how efficiently the resulting basis functions span the variational space to describe the near-continuum states involved in HHG. Further, we calculated the higher harmonic spectra of three two-electron systems -- H$^{-}$, He and Li$^{+}$ -- generated by 800nm driving laser-pulses with pulse-width of 54fs and peak intensities in the tunnel ionization regime of each system. We analyze the performance of these basis sets with an increasing number of higher angular momentum functions and show that up to $g$-type functions are required to obtain qualitatively accurate harmonic spectra. Additionally, we also comment on the impact of electron correlation on the HHG spectra. Finally, we show that by systematically augmenting additional shells we model the strong-field dynamics at higher laser peak intensities.
△ Less
Submitted 2 July, 2023;
originally announced July 2023.
-
Band gaps of long-period polytypes of IV, IV-IV, and III-V semiconductors estimated with an Ising-type additivity model
Authors:
Raghunathan Ramakrishnan,
Shruti Jain
Abstract:
We apply an Ising-type model to estimate the band gaps of the polytypes of group IV elements (C, Si, and Ge) and binary compounds of groups: IV-IV (SiC, GeC, and GeSi), and III-V (nitride, phosphide, and arsenide of B, Al, and Ga). The models use reference band gaps of the simplest polytypes comprising 2--6 bilayers calculated with the hybrid density functional approximation, HSE06. We report four…
▽ More
We apply an Ising-type model to estimate the band gaps of the polytypes of group IV elements (C, Si, and Ge) and binary compounds of groups: IV-IV (SiC, GeC, and GeSi), and III-V (nitride, phosphide, and arsenide of B, Al, and Ga). The models use reference band gaps of the simplest polytypes comprising 2--6 bilayers calculated with the hybrid density functional approximation, HSE06. We report four models capable of estimating band gaps of nine polytypes containing 7 and 8 bilayers with an average error of $\lesssim0.05$ eV. We apply the best model with an error of $<0.04$ eV to predict the band gaps of 497 polytypes with up to 15 bilayers in the unit cell, providing a comprehensive view of the variation in the electronic structure with the degree of hexagonality of the crystal structure. Within our enumeration, we identify four rhombohedral polytypes of SiC -- 9$R$, 12$R$, 15$R$(1), and 15$R$(2) -- and perform detailed stability and band structure analysis. Of these, 15$R$(1) that has not been experimentally characterized has the widest band gap ($>3.4$ eV); phonon analysis and cohesive energy reveal 15$R$(1)-SiC to be metastable. Additionally, we model the energies of valence and conduction bands of the rhombohedral SiC phases at the high-symmetry points of the Brillouin zone and predict band structure characteristics around the Fermi level. The models presented in this study may aid in identifying polytypic phases suitable for various applications, such as the design of wide-gap materials, that are relevant to high-voltage applications. In particular, the method holds promise for forecasting electronic properties of long-period and ultra-long-period polytypes for which accurate first-principles modeling is computationally challenging.
△ Less
Submitted 28 August, 2023; v1 submitted 30 June, 2023;
originally announced June 2023.
-
Stereo-Electronic Factors Influencing the Stability of Hydroperoxyalkyl Radicals: Transferability of Chemical Trends across Hydrocarbons and ab initio Methods
Authors:
Saurabh Chandra Kandpal,
Kgalaletso P. Otukile,
Shweta Jindal,
Salini Senthil,
Cameron Matthews,
Sabyasachi Chakraborty,
Lyudmila V. Moskaleva,
Raghunathan Ramakrishnan
Abstract:
The hydroperoxyalkyl radicals (.QOOH) are known to play a significant role in combustion and tropospheric processes, yet their direct spectroscopic detection remains challenging. In this study, we investigate molecular stereo-electronic effects influencing the kinetic and thermodynamic stability of a .QOOH along its formation path from the precursor, alkylperoxyl radical (ROO.), and the depletion…
▽ More
The hydroperoxyalkyl radicals (.QOOH) are known to play a significant role in combustion and tropospheric processes, yet their direct spectroscopic detection remains challenging. In this study, we investigate molecular stereo-electronic effects influencing the kinetic and thermodynamic stability of a .QOOH along its formation path from the precursor, alkylperoxyl radical (ROO.), and the depletion path resulting in the formation of cyclic ether + .OH. We focus on reactive intermediates encountered in the oxidation of acyclic hydrocarbon radicals: ethyl, isopropyl, isobutyl, tert-butyl, neopentyl, and their alicyclic counterparts: cyclohexyl, cyclohexenyl, and cyclohexadienyl. We report reaction energies and barriers calculated with the highly accurate method Weizmann-1 (W1) for the channels: ROO. <=> .QOOH, ROO. <=> alkene + .OOH, .QOOH <=> alkene + .OOH, and .QOOH <=> cyclic ether + .OH. Using W1 results as a reference, we have systematically benchmarked the accuracy of popular density functional theory (DFT), composite thermochemistry methods, and an explicitly correlated coupled-cluster method. We ascertain inductive, resonance, and steric effects on the overall stability of .QOOH and computationally investigate the possibility of forming more stable species. With new reactions as test cases, we probe the capacity of various ab initio methods to yield quantitative insights on the elementary steps of combustion.
△ Less
Submitted 21 September, 2023; v1 submitted 22 May, 2023;
originally announced May 2023.
-
Learning stochastic filtering
Authors:
Rahul O. Ramakrishnan,
Andrea Auconi,
Benjamin M. Friedrich
Abstract:
We quantify the performance of approximations to stochastic filtering by the Kullback-Leibler divergence to the optimal Bayesian filter. Using a two-state Markov process that drives a Brownian measurement process as prototypical test case, we compare two stochastic filtering approximations: a static low-pass filter as baseline, and machine learning of Voltera expansions using nonlinear Vector Auto…
▽ More
We quantify the performance of approximations to stochastic filtering by the Kullback-Leibler divergence to the optimal Bayesian filter. Using a two-state Markov process that drives a Brownian measurement process as prototypical test case, we compare two stochastic filtering approximations: a static low-pass filter as baseline, and machine learning of Voltera expansions using nonlinear Vector Auto Regression (nVAR). We highlight the crucial role of the chosen performance metric, and present two solutions to the specific challenge of predicting a likelihood bounded between $0$ and $1$.
△ Less
Submitted 26 June, 2022;
originally announced June 2022.
-
Understanding the role of intramolecular ion-pair interactions in conformational stability using an ab initio thermodynamic cycle
Authors:
Sabyasachi Chakraborty,
Kalyaneswar Mandal,
Raghunathan Ramakrishnan
Abstract:
Intramolecular ion-pair interactions yield shape and functionality to many molecules. With proper orientation, these interactions overcome steric factors and are responsible for the compact structures of several peptides. In this study, we present a thermodynamic cycle based on isoelectronic and alchemical mutation to estimate intramolecular ion-pair interaction energy. We determine these energies…
▽ More
Intramolecular ion-pair interactions yield shape and functionality to many molecules. With proper orientation, these interactions overcome steric factors and are responsible for the compact structures of several peptides. In this study, we present a thermodynamic cycle based on isoelectronic and alchemical mutation to estimate intramolecular ion-pair interaction energy. We determine these energies for 26 benchmark molecules with common ion-pair combinations and compare them with results obtained using intramolecular symmetry-adapted perturbation theory. For systems with long linkers, the ion-pair energies evaluated using both approaches deviate by less than 2.5% in vacuum phase. The thermodynamic cycle based on density functional theory facilitates calculations of salt-bridge interactions in model tripeptides with continuum/microsolvation modeling, and four large peptides: 1EJG (crambin), 1BDK (bradykinin), 1L2Y (a mini-protein with a tryptophan cage), and 1SCO (a toxin from the scorpion venom).
△ Less
Submitted 21 December, 2022; v1 submitted 7 March, 2022;
originally announced March 2022.
-
Resolution-vs.-Accuracy Dilemma in Machine Learning Modeling of Electronic Excitation Spectra
Authors:
Prakriti Kayastha,
Sabyasachi Chakraborty,
Raghunathan Ramakrishnan
Abstract:
In this study, we explore the potential of machine learning for modeling molecular electronic spectral intensities as a continuous function in a given wavelength range. Since presently available chemical space datasets provide excitation energies and corresponding oscillator strengths for only a few valence transitions, here, we present a new dataset -- \bigqm -- with 12,880 molecules containing u…
▽ More
In this study, we explore the potential of machine learning for modeling molecular electronic spectral intensities as a continuous function in a given wavelength range. Since presently available chemical space datasets provide excitation energies and corresponding oscillator strengths for only a few valence transitions, here, we present a new dataset -- \bigqm -- with 12,880 molecules containing up to 7 CONF atoms and report ground state and excited state properties. A publicly accessible web-based data-mining platform is presented to facilitate on-the-fly screening of several molecular properties including harmonic vibrational and electronic spectra. We present all singlet electronic transitions from the ground state calculated using the time-dependent density functional theory framework with the $ω$B97XD exchange-correlation functional and a diffuse-function augmented basis set. The resulting spectra predominantly span the X-ray to deep-UV region (10--120 nm). To compare the target spectra with predictions based on small basis sets, we bin spectral intensities and show good agreement is obtained only at the expense of the resolution. Compared to this, machine learning models with latest structural representations trained directly using $<10 \%$ of the target data recover the spectra of the remaining molecules with better accuracies at a desirable $<1$ nm wavelength resolution.
△ Less
Submitted 31 July, 2022; v1 submitted 22 October, 2021;
originally announced October 2021.
-
Data-Driven Modeling of S0 -> S1 Excitation Energy in the BODIPY Chemical Space: High-Throughput Computation, Quantum Machine Learning, and Inverse Design
Authors:
Amit Gupta,
Sabyasachi Chakraborty,
Debashree Ghosh,
Raghunathan Ramakrishnan
Abstract:
Derivatives of BODIPY are popular fluorophores due to their synthetic feasibility, structural rigidity, high quantum yield, and tunable spectroscopic properties. While the characteristic absorption maximum of BODIPY is at 2.5 eV, combinations of functional groups and substitution sites can shift the peak position by +/- 1 eV. Time-dependent long-range corrected hybrid density functional methods ca…
▽ More
Derivatives of BODIPY are popular fluorophores due to their synthetic feasibility, structural rigidity, high quantum yield, and tunable spectroscopic properties. While the characteristic absorption maximum of BODIPY is at 2.5 eV, combinations of functional groups and substitution sites can shift the peak position by +/- 1 eV. Time-dependent long-range corrected hybrid density functional methods can model the lowest excitation energies offering a semi-quantitative precision of +/- 0.3 eV. Alas, the chemical space of BODIPYs stemming from combinatorial introduction of -- even a few dozen -- substituents is too large for brute-force high-throughput modeling. To navigate this vast space, we select 77,412 molecules and train a kernel-based quantum machine learning model providing < 2% hold-out error. Further reuse of the results presented here to navigate the entire BODIPY universe comprising over 253 giga (253 x 10^9) molecules is demonstrated by inverse-designing candidates with desired target excitation energies.
△ Less
Submitted 28 October, 2021; v1 submitted 11 October, 2021;
originally announced October 2021.
-
Machine Learning Modeling of Materials with a Group-Subgroup Structure
Authors:
Prakriti Kayastha,
Raghunathan Ramakrishnan
Abstract:
Crystal structures connected by continuous phase transitions are linked through mathematical relations between crystallographic groups and their subgroups. In the present study, we introduce group-subgroup machine learning (GS-ML) and show that including materials with small unit cells in the training set decreases out-of-sample prediction errors for materials with large unit cells. GS-ML incurs t…
▽ More
Crystal structures connected by continuous phase transitions are linked through mathematical relations between crystallographic groups and their subgroups. In the present study, we introduce group-subgroup machine learning (GS-ML) and show that including materials with small unit cells in the training set decreases out-of-sample prediction errors for materials with large unit cells. GS-ML incurs the least training cost to reach 2-3% target accuracy compared to other ML approaches. Since available materials datasets are heterogeneous providing insufficient examples for realizing the group-subgroup structure, we present the "FriezeRMQ1D" dataset with 8393 Q1D organometallic materials uniformly distributed across 7 frieze groups. Furthermore, by comparing the performances of FCHL and 1-hot representations, we show GS-ML to capture subgroup information efficiently when the descriptor encodes structural information. The proposed approach is generic and extendable to symmetry abstractions such as spin-, valency-, or charge order.
△ Less
Submitted 27 April, 2021; v1 submitted 31 December, 2020;
originally announced December 2020.
-
Troubleshooting Unstable Molecules in Chemical Space
Authors:
Salini Senthil,
Sabyasachi Chakraborty,
Raghunathan Ramakrishnan
Abstract:
A key challenge in automated chemical compound space explorations is ensuring veracity in minimum energy geometries---to preserve intended bonding connectivities. We discuss an iterative high-throughput workflow for connectivity preserving geometry optimizations exploiting the nearness between quantum mechanical models. The methodology is benchmarked on the QM9 dataset comprising DFT-level propert…
▽ More
A key challenge in automated chemical compound space explorations is ensuring veracity in minimum energy geometries---to preserve intended bonding connectivities. We discuss an iterative high-throughput workflow for connectivity preserving geometry optimizations exploiting the nearness between quantum mechanical models. The methodology is benchmarked on the QM9 dataset comprising DFT-level properties of 133,885 small molecules; of which 3,054 have questionable geometric stability. We successfully troubleshoot 2,988 molecules and ensure a bijective mapping between desired Lewis formulae and final geometries. Our workflow, based on DFT and post-DFT methods, identifies 66 molecules as unstable; 52 contain $-{\rm NNO}-$, the rest are strained due to pyramidal sp$^2$ C. In the curated dataset, we inspect molecules with long CC bonds and identify ultralong contestants ($r>1.70$~Å) supported by topological analysis of electron density. We hope the proposed strategy to play a role in big data quantum chemistry initiatives.
△ Less
Submitted 15 October, 2020; v1 submitted 6 October, 2020;
originally announced October 2020.
-
Revving up 13C NMR shielding predictions across chemical space: Benchmarks for atoms-in-molecules kernel machine learning with new data for 134 kilo molecules
Authors:
Amit Gupta,
Sabyasachi Chakraborty,
Raghunathan Ramakrishnan
Abstract:
The requirement for accelerated and quantitatively accurate screening of nuclear magnetic resonance spectra across the small molecules chemical compound space is two-fold: (1) a robust `local' machine learning (ML) strategy capturing the effect of neighbourhood on an atom's `near-sighted' property -- chemical shielding; (2) an accurate reference dataset generated with a state-of-the-art first prin…
▽ More
The requirement for accelerated and quantitatively accurate screening of nuclear magnetic resonance spectra across the small molecules chemical compound space is two-fold: (1) a robust `local' machine learning (ML) strategy capturing the effect of neighbourhood on an atom's `near-sighted' property -- chemical shielding; (2) an accurate reference dataset generated with a state-of-the-art first principles method for training. Herein we report the QM9-NMR dataset comprising isotropic shielding of over 0.8 million C atoms in 134k molecules of the QM9 dataset in gas and five common solvent phases. Using these data for training, we present benchmark results for the prediction transferability of kernel-ridge regression models with popular local descriptors. Our best model trained on 100k samples, accurately predict isotropic shielding of 50k `hold-out' atoms with a mean error of less than $1.9$ ppm. For rapid prediction of new query molecules, the models were trained on geometries from an inexpensive theory. Furthermore, by using a $Δ$-ML strategy, we quench the error below $1.4$ ppm. Finally, we test the transferability on non-trivial benchmark sets that include benchmark molecules comprising 10 to 17 heavy atoms and drugs.
△ Less
Submitted 3 December, 2020; v1 submitted 14 September, 2020;
originally announced September 2020.
-
Critical Benchmarking of the G4(MP2) Model, the Correlation Consistent Composite Approach and Popular Density Functional Approximations on a Probabilistically Pruned Benchmark Dataset of Formation Enthalpies
Authors:
Sambit Kumar Das,
Sabyasachi Chakraborty,
Raghunathan Ramakrishnan
Abstract:
First-principles calculation of the standard formation enthalpy, $ΔH_f^\circ$ (298K), in such large scale as required by chemical space explorations, is amenable only with density functional approximations (DFAs) and some composite wave function theories (cWFTs). Alas, the accuracies of popular range-separated hybrid, `rung-4' DFAs, and cWFTs that offer the best accuracy-vs.-cost trade-off have as…
▽ More
First-principles calculation of the standard formation enthalpy, $ΔH_f^\circ$ (298K), in such large scale as required by chemical space explorations, is amenable only with density functional approximations (DFAs) and some composite wave function theories (cWFTs). Alas, the accuracies of popular range-separated hybrid, `rung-4' DFAs, and cWFTs that offer the best accuracy-vs.-cost trade-off have as yet been established only for datasets predominantly comprising small molecules, hence, their transferability to larger datasets remains vague. In this study, we present an extended benchmark dataset of over 1600 values of $ΔH_f^\circ$ for structurally and electronically diverse molecules. We apply quartile-ranking based on boundary-corrected kernel density estimation to filter outliers and arrive at Probabilistically Pruned Enthalpies of 1694 compounds (PPE1694). For this dataset, we rank the prediction accuracies of G4, G4(MP2), ccCA, CBS-QB3 and 23 popular DFAs using conventional and probabilistic error metrics. We discuss systematic prediction errors and highlight the role an empirical higher-level correction (HLC) plays in the G4(MP2) model. Furthermore, we comment on uncertainties associated with the reference empirical data for atoms and the systematic errors stemming from these that grow with the molecular size. We believe these findings to aid in identifying meaningful application domains for quantum thermochemical methods.
△ Less
Submitted 28 December, 2020; v1 submitted 13 July, 2020;
originally announced July 2020.
-
Quantum-chemistry-aided identification, synthesis and experimental validation of model systems for conformationally controlled reaction studies: Separation of the conformers of 2,3-dibromobuta-1,3-diene in the gas phase
Authors:
Ardita Kilaj,
Hong Gao,
Diana Tahchieva,
Raghunathan Ramakrishnan,
Daniel Bachmann,
Dennis Gillingham,
O. Anatole von Lilienfeld,
Jochen Küpper,
Stefan Willitsch
Abstract:
The Diels-Alder cycloaddition, in which a diene reacts with a dienophile to form a cyclic compound, counts among the most important tools in organic synthesis. Achieving a precise understanding of its mechanistic details on the quantum level requires new experimental and theoretical methods. Here, we present an experimental approach that separates different diene conformers in a molecular beam as…
▽ More
The Diels-Alder cycloaddition, in which a diene reacts with a dienophile to form a cyclic compound, counts among the most important tools in organic synthesis. Achieving a precise understanding of its mechanistic details on the quantum level requires new experimental and theoretical methods. Here, we present an experimental approach that separates different diene conformers in a molecular beam as a prerequisite for the investigation of their individual cycloaddition reaction kinetics and dynamics under single-collision conditions in the gas phase. A low- and high-level quantum-chemistry-based screening of more than one hundred dienes identified 2,3-dibromobutadiene (DBB) as an optimal candidate for efficient separation of its gauche and s-trans conformers by electrostatic deflection. A preparation method for DBB was developed which enabled the generation of dense molecular beams of this compound. The theoretical predictions of the molecular properties of DBB were validated by the successful separation of the conformers in the molecular beam. A marked difference in photofragment ion yields of the two conformers upon femtosecond-laser pulse ionization was observed, pointing at a pronounced conformer-specific fragmentation dynamics of ionized DBB. Our work sets the stage for a rigorous examination of mechanistic models of cycloaddition reactions under controlled conditions in the gas phase.
△ Less
Submitted 20 April, 2020;
originally announced April 2020.
-
Charge-Transfer Selectivity and Quantum Interference in Real-Time Electron Dynamics: Gaining Insights from Time-Dependent Configuration Interaction Simulations
Authors:
Raghunathan Ramakrishnan
Abstract:
Many-electron wavepacket dynamics based on time-dependent configuration interaction (TDCI) is a numerically rigorous approach to quantitatively model electron-transfer across molecular junctions. TDCI simulations of cyanobenzene thiolates---para- and meta-linked to an acceptor gold atom---show donor states \emph{conjugating} with the benzene $π$-network to allow better through-molecule electron mi…
▽ More
Many-electron wavepacket dynamics based on time-dependent configuration interaction (TDCI) is a numerically rigorous approach to quantitatively model electron-transfer across molecular junctions. TDCI simulations of cyanobenzene thiolates---para- and meta-linked to an acceptor gold atom---show donor states \emph{conjugating} with the benzene $π$-network to allow better through-molecule electron migration in the para isomer compared to the meta counterpart. For dynamics involving \emph{non-conjugating} states, we find electron-injection to stem exclusively from distance-dependent non-resonant quantum mechanical tunneling, in which case the meta isomer exhibits better dynamics. Computed trend in donor-to-acceptor net-electron transfer through differently linked azulene bridges agrees with the trend seen in low-bias conductivity measurements. Disruption of $π$-conjugation has been shown to be the cause of diminished electron-injection through the 1,3-azulene, a pathological case for graph-based diagnosis of destructive quantum interference. Furthermore, we demonstrate quantum interference of many-electron wavefunctions to drive para- vs. meta- selectivity in the coherent evolution of superposed $π$(CN)- and $σ$(NC-C)-type wavepackets. Analyses reveal that in the para-linked benzene, $σ$ and $π$ MOs localized at the donor terminal are \emph{in-phase} leading to constructive interference of electron density distribution while phase-flip of one of the MOs in the meta isomer results in destructive interference. These findings suggest that \emph{a priori} detection of orbital phase-flip and quantum coherence conditions can aid in molecular device design strategies.
△ Less
Submitted 26 March, 2020; v1 submitted 24 November, 2019;
originally announced November 2019.
-
The Chemical Space of B, N-substituted Polycyclic Aromatic Hydrocarbons: Combinatorial Enumeration and High-Throughput First-Principles Modeling
Authors:
Sabyasachi Chakraborty,
Prakriti Kayastha,
Raghunathan Ramakrishnan
Abstract:
Combinatorial introduction of heteroatoms in the two-dimensional framework of aromatic hydrocarbons opens up possibilities to design compound libraries exhibiting desirable photovoltaic and photochemical properties. Exhaustive enumeration and first-principles characterization of this chemical space provide indispensable insights for rational compound design strategies. Here, for the smallest seven…
▽ More
Combinatorial introduction of heteroatoms in the two-dimensional framework of aromatic hydrocarbons opens up possibilities to design compound libraries exhibiting desirable photovoltaic and photochemical properties. Exhaustive enumeration and first-principles characterization of this chemical space provide indispensable insights for rational compound design strategies. Here, for the smallest seventy-seven Kekulean-benzenoid polycyclic systems, we reveal combinatorial substitution of C atom pairs with the isosteric and isoelectronic B, N pairs to result in 7,453,041,547,842 (7.4 tera) unique molecules. We present comprehensive frequency distributions of this chemical space, analyze trends and discuss a symmetry-controlled selectivity manifestable in synthesis product-yield. Furthermore, by performing high-throughput ab initio density functional theory calculations of over thirty-three thousand (33k) representative molecules, we discuss quantitative trends in the structural stability and inter-property relationships across heteroarenes. Our results indicate a significant fraction of the 33k molecules to be electronically active in the 1.5-2.5 eV region, encompassing the most intense region of the solar spectrum, indicating their suitability as potential light-harvesting molecular components in photo-catalyzed solar cells.
△ Less
Submitted 22 February, 2019; v1 submitted 3 January, 2019;
originally announced January 2019.
-
Torsional potentials of glyoxal, oxalyl halides and their thiocarbonyl derivatives: Challenges for popular density functional approximations
Authors:
D. Tahchieva,
D. Bakowies,
R. Ramakrishnan,
O. A. von Lilienfeld
Abstract:
The reliability of popular density functionals was studied for the description of torsional profiles of 36 molecules: glyoxal, oxalyl halides and their thiocarbonyl derivatives. HF and \textcolor{black}{eighteen} functionals of varying complexity, from local density to range-separated hybrid approximations and double-hybrid, have been considered and benchmarked against CCSD(T)-level rotational pro…
▽ More
The reliability of popular density functionals was studied for the description of torsional profiles of 36 molecules: glyoxal, oxalyl halides and their thiocarbonyl derivatives. HF and \textcolor{black}{eighteen} functionals of varying complexity, from local density to range-separated hybrid approximations and double-hybrid, have been considered and benchmarked against CCSD(T)-level rotational profiles. For molecules containing heavy halogens, all functionals except M05-2X and M06-2X fail to reproduce barrier heights accurately and a number of functionals introduce spurious minima. Dispersion corrections show no improvement. Calibrated torsion-corrected atom-centered potentials rectify the shortcomings of PBE and also improve on $σ$-hole based intermolecular binding in dimers and crystals.
△ Less
Submitted 8 December, 2020; v1 submitted 16 February, 2018;
originally announced February 2018.
-
Machine Learning Modeling of Wigner Intracule Functionals for Two Electrons in One Dimension
Authors:
Rutvij Vihang Bhavsar,
Raghunathan Ramakrishnan
Abstract:
In principle, many-electron correlation energy can be precisely computed from a reduced Wigner distribution function ($\mathcal{W}$) thanks to a universal functional transformation ($\mathcal{F}$), whose formal existence is akin to that of the exchange-correlation functional in density functional theory. While the exact dependence of $\mathcal{F}$ on $\mathcal{W}$ is unknown, a few approximate par…
▽ More
In principle, many-electron correlation energy can be precisely computed from a reduced Wigner distribution function ($\mathcal{W}$) thanks to a universal functional transformation ($\mathcal{F}$), whose formal existence is akin to that of the exchange-correlation functional in density functional theory. While the exact dependence of $\mathcal{F}$ on $\mathcal{W}$ is unknown, a few approximate parametric models have been proposed in the past. Here, for a dataset of 923 one-dimensional external potentials with two interacting electrons, we apply machine learning to model $\mathcal{F}$ within the kernel Ansatz. We deal with over-fitting of the kernel to a specific region of phase-space by a one-step regularization not depending on any hyperparameters. Reference correlation energies have been computed by performing exact and Hartree--Fock calculations using discrete variable representation. The resulting models require $\mathcal{W}$ calculated at the Hartree--Fock level as input while yielding monotonous decay in the predicted correlation energies of new molecules reaching sub-chemical accuracy with training.
△ Less
Submitted 21 January, 2019; v1 submitted 2 February, 2018;
originally announced February 2018.
-
Exact separation of radial and angular correlation energies in two-electron atoms
Authors:
Anjana R Kammath,
Raghunathan Ramakrishnan
Abstract:
Partitioning of helium atom's correlation energy into radial and angular contributions, although of fundamental interest, has eluded critical scrutiny. Conventionally, radial and angular correlation energies of helium atom are defined for its ground state as deviations, from Hartree--Fock and exact values, of the energy obtained using a purely radial wavefunction devoid of any explicit dependence…
▽ More
Partitioning of helium atom's correlation energy into radial and angular contributions, although of fundamental interest, has eluded critical scrutiny. Conventionally, radial and angular correlation energies of helium atom are defined for its ground state as deviations, from Hartree--Fock and exact values, of the energy obtained using a purely radial wavefunction devoid of any explicit dependence on the interelectronic distance. Here, we show this rationale to associate the contribution from radial-angular coupling entirely to the angular part underestimating the radial one, thereby also incorrectly predict non-vanishing residual radial probability densities. We derive analytic matrix elements for the high-precision Hylleraas basis set framework to seamlessly uncouple the angular correlation energy from its radial counterpart. The resulting formula agrees with numerical cubature yielding precise purely angular correlation energies for the ground as well as excited states. Our calculations indicate 60.2% of helium's correlation energy to arise from strictly radial interactions; when excluding the contribution from the radial-angular coupling, this value drops to 41.3%.
△ Less
Submitted 1 February, 2019; v1 submitted 22 January, 2018;
originally announced January 2018.
-
Genetic optimization of training sets for improved machine learning models of molecular properties
Authors:
Nicholas J. Browning,
Raghunathan Ramakrishnan,
O. Anatole von Lilienfeld,
Ursula Röthlisberger
Abstract:
The training of molecular models of quantum mechanical properties based on statistical machine learning requires large datasets which exemplify the map from chemical structure to molecular property. Intelligent a priori selection of training examples is often difficult or impossible to achieve as prior knowledge may be sparse or unavailable. Ordinarily representative selection of training molecule…
▽ More
The training of molecular models of quantum mechanical properties based on statistical machine learning requires large datasets which exemplify the map from chemical structure to molecular property. Intelligent a priori selection of training examples is often difficult or impossible to achieve as prior knowledge may be sparse or unavailable. Ordinarily representative selection of training molecules from such datasets is achieved through random sampling. We use genetic algorithms for the optimization of training set composition consisting of tens of thousands of small organic molecules. The resulting machine learning models are considerably more accurate with respect to small randomly selected training sets: mean absolute errors for out-of-sample predictions are reduced to ~25% for enthalpies, free energies, and zero-point vibrational energy, to ~50% for heat-capacity, electron-spread, and polarizability, and by more than ~20% for electronic properties such as frontier orbital eigenvalues or dipole-moments. We discuss and present optimized training sets consisting of 10 molecular classes for all molecular properties studied. We show that these classes can be used to design improved training sets for the generation of machine learning models of the same properties in similar but unrelated molecular sets.
△ Less
Submitted 24 November, 2016; v1 submitted 22 November, 2016;
originally announced November 2016.
-
Machine Learning, Quantum Mechanics, and Chemical Compound Space
Authors:
Raghunathan Ramakrishnan,
O. Anatole von Lilienfeld
Abstract:
We review recent studies dealing with the generation of machine learning models of molecular and solid properties. The models are trained and validated using standard quantum chemistry results obtained for organic molecules and materials selected from chemical space at random.
We review recent studies dealing with the generation of machine learning models of molecular and solid properties. The models are trained and validated using standard quantum chemistry results obtained for organic molecules and materials selected from chemical space at random.
△ Less
Submitted 12 May, 2016; v1 submitted 26 October, 2015;
originally announced October 2015.
-
Fast and accurate predictions of covalent bonds in chemical space
Authors:
K. Y. Samuel Chang,
Stijn Fias,
Raghunathan Ramakrishnan,
O. Anatole von Lilienfeld
Abstract:
We assess the predictive accuracy of perturbation theory based estimates of changes in covalent bonding due to linear alchemical interpolations among molecules. We have investigated $σ$ bonding to hydrogen, as well as $σ$ and $π$ bonding between main-group elements, occurring in small sets of iso-valence-electronic molecular species with elements drawn from second to fourth rows in the $p$-block o…
▽ More
We assess the predictive accuracy of perturbation theory based estimates of changes in covalent bonding due to linear alchemical interpolations among molecules. We have investigated $σ$ bonding to hydrogen, as well as $σ$ and $π$ bonding between main-group elements, occurring in small sets of iso-valence-electronic molecular species with elements drawn from second to fourth rows in the $p$-block of the periodic table. Numerical evidence suggests that first order estimates of covalent bonding potentials can achieve chemical accuracy if (i) the alchemical interpolation is vertical (fixed geometry), (ii) involves molecules containing elements in the third and fourth row of the periodic table, and (iii) a reference geometry is optimized. In this case, changes in the bonding potential become near-linear in coupling parameter, resulting in analytical predictions with very high accuracy ($\sim$1 kcal/mol). Second order estimates deteriorate the prediction. If initial and final molecules differ not only in composition but also in geometry, all estimates become substantially worse, with second order being slightly more accurate than first order. The independent particle approximation to the second order perturbation performs poorly when compared to the coupled perturbed or finite difference approach. Taylor series expansions up to fourth order of the potential energy curve of highly symmetric systems indicate a finite radius of convergence, as illustrated for the alchemical stretching of H$_2^+$. Numerical results are presented for covalent bonds to hydrogen in 12 molecules with 8 valence electrons; (ii) main-group single bonds in 9 molecules with 14 valence electrons; (iii) main-group double bonds in 9 molecules with 12 valence electrons; (iv) main-group triple bonds in 9 molecules with 10 valence electrons; (v) H$_2^+$ single bond with 1 electron.
△ Less
Submitted 13 January, 2016; v1 submitted 9 September, 2015;
originally announced September 2015.
-
Machine Learning for Quantum Mechanical Properties of Atoms in Molecules
Authors:
Matthias Rupp,
Raghunathan Ramakrishnan,
O. Anatole von Lilienfeld
Abstract:
We introduce machine learning models of quantum mechanical observables of atoms in molecules. Instant out-of-sample predictions for proton and carbon nuclear chemical shifts, atomic core level excitations, and forces on atoms reach accuracies on par with density functional theory reference. Locality is exploited within non-linear regression via local atom-centered coordinate systems. The approach…
▽ More
We introduce machine learning models of quantum mechanical observables of atoms in molecules. Instant out-of-sample predictions for proton and carbon nuclear chemical shifts, atomic core level excitations, and forces on atoms reach accuracies on par with density functional theory reference. Locality is exploited within non-linear regression via local atom-centered coordinate systems. The approach is validated on a diverse set of 9k small organic molecules. Linear scaling of computational cost in system size is demonstrated for saturated polymers with up to sub-mesoscale lengths.
△ Less
Submitted 25 August, 2015; v1 submitted 2 May, 2015;
originally announced May 2015.
-
Electronic Spectra from TDDFT and Machine Learning in Chemical Space
Authors:
Raghunathan Ramakrishnan,
Mia Hartmann,
Enrico Tapavicza,
O. Anatole von Lilienfeld
Abstract:
Due to its favorable computational efficiency time-dependent (TD) density functional theory (DFT) enables the prediction of electronic spectra in a high-throughput manner across chemical space. Its predictions, however, can be quite inaccurate. We resolve this issue with machine learning models trained on deviations of reference second-order approximate coupled-cluster singles and doubles (CC2) sp…
▽ More
Due to its favorable computational efficiency time-dependent (TD) density functional theory (DFT) enables the prediction of electronic spectra in a high-throughput manner across chemical space. Its predictions, however, can be quite inaccurate. We resolve this issue with machine learning models trained on deviations of reference second-order approximate coupled-cluster singles and doubles (CC2) spectra from TDDFT counterparts, or even from DFT gap. We applied this approach to low-lying singlet-singlet vertical electronic spectra of over 20 thousand synthetically feasible small organic molecules with up to eight CONF atoms. The prediction errors decay monotonously as a function of training set size. For a training set of 10 thousand molecules, CC2 excitation energies can be reproduced to within $\pm$0.1 eV for the remaining molecules. Analysis of our spectral database via chromophore counting suggests that even higher accuracies can be achieved. Based on the evidence collected, we discuss open challenges associated with data-driven modeling of high-lying spectra, and transition intensities.
△ Less
Submitted 4 July, 2015; v1 submitted 8 April, 2015;
originally announced April 2015.
-
Big Data meets Quantum Chemistry Approximations: The $Δ$-Machine Learning Approach
Authors:
Raghunathan Ramakrishnan,
Pavlo O. Dral,
Matthias Rupp,
O. Anatole von Lilienfeld
Abstract:
Chemically accurate and comprehensive studies of the virtual space of all possible molecules are severely limited by the computational cost of quantum chemistry. We introduce a composite strategy that adds machine learning corrections to computationally inexpensive approximate legacy quantum methods. After training, highly accurate predictions of enthalpies, free energies, entropies, and electron…
▽ More
Chemically accurate and comprehensive studies of the virtual space of all possible molecules are severely limited by the computational cost of quantum chemistry. We introduce a composite strategy that adds machine learning corrections to computationally inexpensive approximate legacy quantum methods. After training, highly accurate predictions of enthalpies, free energies, entropies, and electron correlation energies are possible, for significantly larger molecular sets than used for training. For thermochemical properties of up to 16k constitutional isomers of C$_7$H$_{10}$O$_2$ we present numerical evidence that chemical accuracy can be reached. We also predict electron correlation energy in post Hartree-Fock methods, at the computational cost of Hartree-Fock, and we establish a qualitative relationship between molecular entropy and electron correlation. The transferability of our approach is demonstrated, using semi-empirical quantum chemistry and machine learning models trained on 1 and 10\% of 134k organic molecules, to reproduce enthalpies of all remaining molecules at density functional theory level of accuracy.
△ Less
Submitted 17 March, 2015;
originally announced March 2015.
-
Many Molecular Properties from One Kernel in Chemical Space
Authors:
Raghunathan Ramakrishnan,
O. Anatole von Lilienfeld
Abstract:
We introduce property-independent kernels for machine learning modeling of arbitrarily many molecular properties. The kernels encode molecular structures for training sets of varying size, as well as similarity measures sufficiently diffuse in chemical space to sample over all training molecules. Corresponding molecular reference properties provided, they enable the instantaneous generation of ML…
▽ More
We introduce property-independent kernels for machine learning modeling of arbitrarily many molecular properties. The kernels encode molecular structures for training sets of varying size, as well as similarity measures sufficiently diffuse in chemical space to sample over all training molecules. Corresponding molecular reference properties provided, they enable the instantaneous generation of ML models which can systematically be improved through the addition of more data. This idea is exemplified for single kernel based modeling of internal energy, enthalpy, free energy, heat capacity, polarizability, electronic spread, zero-point vibrational energy, energies of frontier orbitals, HOMO-LUMO gap, and the highest fundamental vibrational wavenumber. Models of these properties are trained and tested using 112 kilo organic molecules of similar size. Resulting models are discussed as well as the kernels' use for generating and using other property models.
△ Less
Submitted 17 March, 2015; v1 submitted 16 February, 2015;
originally announced February 2015.
-
Fourier series of atomic radial distribution functions: A molecular fingerprint for machine learning models of quantum chemical properties
Authors:
O. Anatole von Lilienfeld,
Raghunathan Ramakrishnan,
Matthias Rupp,
Aaron Knoll
Abstract:
We introduce a fingerprint representation of molecules based on a Fourier series of atomic radial distribution functions. This fingerprint is unique (except for chirality), continuous, and differentiable with respect to atomic coordinates and nuclear charges. It is invariant with respect to translation, rotation, and nuclear permutation, and requires no pre-conceived knowledge about chemical bondi…
▽ More
We introduce a fingerprint representation of molecules based on a Fourier series of atomic radial distribution functions. This fingerprint is unique (except for chirality), continuous, and differentiable with respect to atomic coordinates and nuclear charges. It is invariant with respect to translation, rotation, and nuclear permutation, and requires no pre-conceived knowledge about chemical bonding, topology, or electronic orbitals. As such it meets many important criteria for a good molecular representation, suggesting its usefulness for machine learning models of molecular properties trained across chemical compound space. To assess the performance of this new descriptor we have trained machine learning models of molecular enthalpies of atomization for training sets with up to 10k organic molecules, drawn at random from a published set of 134k organic molecules. We validate the descriptor on all remaining molecules of the 134k set. For a training set of 5k molecules the fingerprint descriptor achieves a mean absolute error of 8.0 kcal/mol, respectively. This is slightly worse than the performance attained using the Coulomb matrix, another popular alternative, reaching 6.2 kcal/mol for the same training and test sets.
△ Less
Submitted 17 March, 2015; v1 submitted 10 July, 2013;
originally announced July 2013.