-
Accurate nuclear quantum statistics on machine-learned classical effective potentials
Authors:
Iryna Zaporozhets,
Félix Musil,
Venkat Kapil,
Cecilia Clementi
Abstract:
The contribution of nuclear quantum effects (NQEs) to the properties of various hydrogen-bound systems, including biomolecules, is increasingly recognized. Despite the development of many acceleration techniques, the computational overhead of incorporating NQEs in complex systems is sizable, particularly at low temperatures. In this work, we leverage deep learning and multiscale coarse-graining te…
▽ More
The contribution of nuclear quantum effects (NQEs) to the properties of various hydrogen-bound systems, including biomolecules, is increasingly recognized. Despite the development of many acceleration techniques, the computational overhead of incorporating NQEs in complex systems is sizable, particularly at low temperatures. In this work, we leverage deep learning and multiscale coarse-graining techniques to mitigate the computational burden of path integral molecular dynamics (PIMD). Specifically, we employ a machine-learned potential to accurately represent corrections to classical potentials, thereby significantly reducing the computational cost of simulating NQEs. We validate our approach using four distinct systems: Morse potential, Zundel cation, single water molecule, and bulk water. Our framework allows us to accurately compute position-dependent static properties, as demonstrated by the excellent agreement obtained between the machine-learned potential and computationally intensive PIMD calculations, even in the presence of strong NQEs. This approach opens the way to the development of transferable machine-learned potentials capable of accurately reproducing NQEs in a wide range of molecular systems.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Lattice matched heterogeneous nucleation eliminate defective buried interface in halide perovskites
Authors:
Paramvir Ahlawat,
Cecilia Clementi,
Felix Musil,
Maria-Andreea Filip
Abstract:
Metal halide perovskite-based semi-conducting hetero-structures have emerged as promising electronics for solar cells, light-emitting diodes, detectors, and photo-catalysts. Perovskites' efficiency, electronic properties and their long-term stability directly depend on their morphology [1-24]. Therefore, to manufacture stable and higher efficiency perovskite solar cells and electronics, it is now…
▽ More
Metal halide perovskite-based semi-conducting hetero-structures have emerged as promising electronics for solar cells, light-emitting diodes, detectors, and photo-catalysts. Perovskites' efficiency, electronic properties and their long-term stability directly depend on their morphology [1-24]. Therefore, to manufacture stable and higher efficiency perovskite solar cells and electronics, it is now crucial to understand their micro-structure evolution. In this study, we perform molecular dynamics simulations to investigate the formation of cesium lead bromide perovskite on interfaces. Our simulations reveal that perovskite crystallizes in a heteroepitaxial manner on widely employed oxide interfaces. This could introduce the formation of dislocations, voids and defects in the buried interface, and grain boundaries in the bulk crystal. From simulations, we find that lattice-matched interfaces could enable epitaxial ordered growth of perovskites and may prevent defect formation in the buried interface.
△ Less
Submitted 19 May, 2024;
originally announced May 2024.
-
Navigating protein landscapes with a machine-learned transferable coarse-grained model
Authors:
Nicholas E. Charron,
Felix Musil,
Andrea Guljas,
Yaoyi Chen,
Klara Bonneau,
Aldo S. Pasos-Trejo,
Jacopo Venturin,
Daria Gusew,
Iryna Zaporozhets,
Andreas Krämer,
Clark Templeton,
Atharva Kelkar,
Aleksander E. P. Durumeric,
Simon Olsson,
Adrià Pérez,
Maciej Majewski,
Brooke E. Husic,
Ankit Patel,
Gianni De Fabritiis,
Frank Noé,
Cecilia Clementi
Abstract:
The most popular and universally predictive protein simulation models employ all-atom molecular dynamics (MD), but they come at extreme computational cost. The development of a universal, computationally efficient coarse-grained (CG) model with similar prediction performance has been a long-standing challenge. By combining recent deep learning methods with a large and diverse training set of all-a…
▽ More
The most popular and universally predictive protein simulation models employ all-atom molecular dynamics (MD), but they come at extreme computational cost. The development of a universal, computationally efficient coarse-grained (CG) model with similar prediction performance has been a long-standing challenge. By combining recent deep learning methods with a large and diverse training set of all-atom protein simulations, we here develop a bottom-up CG force field with chemical transferability, which can be used for extrapolative molecular dynamics on new sequences not used during model parametrization. We demonstrate that the model successfully predicts folded structures, intermediates, metastable folded and unfolded basins, and the fluctuations of intrinsically disordered proteins while it is several orders of magnitude faster than an all-atom model. This showcases the feasibility of a universal and computationally efficient machine-learned CG model for proteins.
△ Less
Submitted 27 October, 2023;
originally announced October 2023.
-
Quantum dynamics using path integral coarse-graining
Authors:
Félix Musil,
Iryna Zaporozhets,
Frank Noé,
Cecilia Clementi,
Venkat Kapil
Abstract:
Vibrational spectra of condensed and gas-phase systems containing light nuclei are influenced by their quantum-mechanical behaviour. The quantum dynamics of light nuclei can be approximated by the imaginary time path integral (PI) formulation, but still at a large computational cost that increases sharply with decreasing temperature. By leveraging advances in machine-learned coarse-graining, we de…
▽ More
Vibrational spectra of condensed and gas-phase systems containing light nuclei are influenced by their quantum-mechanical behaviour. The quantum dynamics of light nuclei can be approximated by the imaginary time path integral (PI) formulation, but still at a large computational cost that increases sharply with decreasing temperature. By leveraging advances in machine-learned coarse-graining, we develop a PI method with the reduced computational cost of a classical simulation. We also propose a simple temperature elevation scheme to significantly attenuate the artefacts of standard PI approaches and also eliminate the unfavourable temperature scaling of the computational cost.We illustrate the approach, by calculating vibrational spectra using standard models of water molecules and bulk water, demonstrating significant computational savings and dramatically improved accuracy compared to more expensive reference approaches. We believe that our simple, efficient and accurate method could enable routine calculations of vibrational spectra including nuclear quantum effects for a wide range of molecular systems.
△ Less
Submitted 23 September, 2022; v1 submitted 12 August, 2022;
originally announced August 2022.
-
Optimal radial basis for density-based atomic representations
Authors:
Alexander Goscinski,
Félix Musil,
Sergey Pozdnyakov,
Michele Ceriotti
Abstract:
The input of almost every machine learning algorithm targeting the properties of matter at the atomic scale involves a transformation of the list of Cartesian atomic coordinates into a more symmetric representation. Many of the most popular representations can be seen as an expansion of the symmetrized correlations of the atom density, and differ mainly by the choice of basis. Considerable effort…
▽ More
The input of almost every machine learning algorithm targeting the properties of matter at the atomic scale involves a transformation of the list of Cartesian atomic coordinates into a more symmetric representation. Many of the most popular representations can be seen as an expansion of the symmetrized correlations of the atom density, and differ mainly by the choice of basis. Considerable effort has been dedicated to the optimization of the basis set, typically driven by heuristic considerations on the behavior of the regression target. Here we take a different, unsupervised viewpoint, aiming to determine the basis that encodes in the most compact way possible the structural information that is relevant for the dataset at hand. For each training dataset and number of basis functions, one can determine a unique basis that is optimal in this sense, and can be computed at no additional cost with respect to the primitive basis by approximating it with splines. We demonstrate that this construction yields representations that are accurate and computationally efficient, particularly when constructing representations that correspond to high-body order correlations. We present examples that involve both molecular and condensed-phase machine-learning models.
△ Less
Submitted 10 January, 2022; v1 submitted 18 May, 2021;
originally announced May 2021.
-
Efficient implementation of atom-density representations
Authors:
Félix Musil,
Max Veit,
Alexander Goscinski,
Guillaume Fraux,
Michael J. Willatt,
Markus Stricker,
Till Junge,
Michele Ceriotti
Abstract:
Physically-motivated and mathematically robust atom-centred representations of molecular structures are key to the success of modern atomistic machine learning (ML) methods. They lie at the foundation of a wide range of methods to predict the properties of both materials and molecules as well as to explore and visualize the chemical compound and configuration space. Recently, it has become clear t…
▽ More
Physically-motivated and mathematically robust atom-centred representations of molecular structures are key to the success of modern atomistic machine learning (ML) methods. They lie at the foundation of a wide range of methods to predict the properties of both materials and molecules as well as to explore and visualize the chemical compound and configuration space. Recently, it has become clear that many of the most effective representations share a fundamental formal connection: that they can all be expressed as a discretization of N-body correlation functions of the local atom density, suggesting the opportunity of standardizing and, more importantly, optimizing the calculation of such representations. We present an implementation, named librascal, whose modular design lends itself both to developing refinements to the density-based formalism and to rapid prototyping for new developments of rotationally equivariant atomistic representations. As an example, we discuss SOAP features, perhaps the most widely used member of this family of representations, to show how the expansion of the local density can be optimized for any choice of radial basis set. We discuss the representation in the context of a kernel ridge regression model, commonly used with SOAP features, and analyze how the computational effort scales for each of the individual steps of the calculation. By applying data reduction techniques in feature space, we show how to further reduce the total computational cost by at up to a factor of 4 or 5 without affecting the model's symmetry properties and without significantly impacting its accuracy.
△ Less
Submitted 21 January, 2021;
originally announced January 2021.
-
Physics-inspired structural representations for molecules and materials
Authors:
Felix Musil,
Andrea Grisafi,
Albert P. Bartók,
Christoph Ortner,
Gábor Csányi,
Michele Ceriotti
Abstract:
The first step in the construction of a regression model or a data-driven analysis, aiming to predict or elucidate the relationship between the atomic scale structure of matter and its properties, involves transforming the Cartesian coordinates of the atoms into a suitable representation. The development of atomic-scale representations has played, and continues to play, a central role in the succe…
▽ More
The first step in the construction of a regression model or a data-driven analysis, aiming to predict or elucidate the relationship between the atomic scale structure of matter and its properties, involves transforming the Cartesian coordinates of the atoms into a suitable representation. The development of atomic-scale representations has played, and continues to play, a central role in the success of machine-learning methods for chemistry and materials science. This review summarizes the current understanding of the nature and characteristics of the most commonly used structural and chemical descriptions of atomistic structures, highlighting the deep underlying connections between different frameworks, and the ideas that lead to computationally efficient and universally applicable models. It emphasizes the link between properties, structures, their physical chemistry and their mathematical description, provides examples of recent applications to a diverse set of chemical and materials science problems, and outlines the open questions and the most promising research directions in the field.
△ Less
Submitted 4 August, 2021; v1 submitted 12 January, 2021;
originally announced January 2021.
-
Machine learning at the atomic-scale
Authors:
Félix Musil,
Michele Ceriotti
Abstract:
Statistical learning algorithms are finding more and more applications in science and technology. Atomic-scale modeling is no exception, with machine learning becoming commonplace as a tool to predict energy, forces and properties of molecules and condensed-phase systems. This short review summarizes recent progress in the field, focusing in particular on the problem of representing an atomic conf…
▽ More
Statistical learning algorithms are finding more and more applications in science and technology. Atomic-scale modeling is no exception, with machine learning becoming commonplace as a tool to predict energy, forces and properties of molecules and condensed-phase systems. This short review summarizes recent progress in the field, focusing in particular on the problem of representing an atomic configuration in a mathematically robust and computationally efficient way. We also discuss some of the regression algorithms that have been used to construct surrogate models of atomic-scale properties. We then show examples of how the optimization of the machine-learning models can both incorporate and reveal insights onto the physical phenomena that underlie structure-property relations.
△ Less
Submitted 8 December, 2020;
originally announced December 2020.
-
Fast and Accurate Uncertainty Estimation in Chemical Machine Learning
Authors:
Felix Musil,
Michael J. Willatt,
Mikhail A. Langovoy,
Michele Ceriotti
Abstract:
We present a scheme to obtain an inexpensive and reliable estimate of the uncertainty associated with the predictions of a machine-learning model of atomic and molecular properties. The scheme is based on resampling, with multiple models being generated based on sub-sampling of the same training data. The accuracy of the uncertainty prediction can be benchmarked by maximum likelihood estimation, w…
▽ More
We present a scheme to obtain an inexpensive and reliable estimate of the uncertainty associated with the predictions of a machine-learning model of atomic and molecular properties. The scheme is based on resampling, with multiple models being generated based on sub-sampling of the same training data. The accuracy of the uncertainty prediction can be benchmarked by maximum likelihood estimation, which can also be used to correct for correlations between resampled models, and to improve the performance of the uncertainty estimation by a cross-validation procedure. In the case of sparse Gaussian Process Regression models, this resampled estimator can be evaluated at negligible cost. We demonstrate the reliability of these estimates for the prediction of molecular energetics, and for the estimation of nuclear chemical shieldings in molecular crystals. Extension to estimate the uncertainty in energy differences, forces, or other correlated predictions is straightforward. This method can be easily applied to other machine learning schemes, and will be beneficial to make data-driven predictions more reliable, and to facilitate training-set optimization and active-learning strategies.
△ Less
Submitted 20 September, 2018;
originally announced September 2018.
-
Atom-Density Representations for Machine Learning
Authors:
Michael J. Willatt,
Felix Musil,
Michele Ceriotti
Abstract:
The applications of machine learning techniques to chemistry and materials science become more numerous by the day. The main challenge is to devise representations of atomic systems that are at the same time complete and concise, so as to reduce the number of reference calculations that are needed to predict the properties of different types of materials reliably. This has led to a proliferation o…
▽ More
The applications of machine learning techniques to chemistry and materials science become more numerous by the day. The main challenge is to devise representations of atomic systems that are at the same time complete and concise, so as to reduce the number of reference calculations that are needed to predict the properties of different types of materials reliably. This has led to a proliferation of alternative ways to convert an atomic structure into an input for a machine-learning model. We introduce an abstract definition of chemical environments that is based on a smoothed atomic density, using a bra-ket notation to emphasize basis set independence and to highlight the connections with some popular choices of representations for describing atomic systems. The correlations between the spatial distribution of atoms and their chemical identities are computed as inner products between these feature kets, which can be given an explicit representation in terms of the expansion of the atom density on orthogonal basis functions, that is equivalent to the smooth overlap of atomic positions (SOAP) power spectrum, but also in real space, corresponding to $n$-body correlations of the atom density. This formalism lays the foundations for a more systematic tuning of the behavior of the representations, by introducing operators that represent the correlations between structure, composition, and the target properties. It provides a unifying picture of recent developments in the field and indicates a way forward towards more effective and computationally affordable machine-learning schemes for molecules and materials.
△ Less
Submitted 28 January, 2019; v1 submitted 1 July, 2018;
originally announced July 2018.
-
Feature Optimization for Atomistic Machine Learning Yields A Data-Driven Construction of the Periodic Table of the Elements
Authors:
Michael J. Willatt,
Félix Musil,
Michele Ceriotti
Abstract:
Machine-learning of atomic-scale properties amounts to extracting correlations between structure, composition and the quantity that one wants to predict. Representing the input structure in a way that best reflects such correlations makes it possible to improve the accuracy of the model for a given amount of reference data. When using a description of the structures that is transparent and well-pr…
▽ More
Machine-learning of atomic-scale properties amounts to extracting correlations between structure, composition and the quantity that one wants to predict. Representing the input structure in a way that best reflects such correlations makes it possible to improve the accuracy of the model for a given amount of reference data. When using a description of the structures that is transparent and well-principled, optimizing the representation might reveal insights into the chemistry of the data set. Here we show how one can generalize the SOAP kernel to introduce a distance-dependent weight that accounts for the multi-scale nature of the interactions, and a description of correlations between chemical species. We show that this improves substantially the performance of ML models of molecular and materials stability, while making it easier to work with complex, multi-component systems and to extend SOAP to coarse-grained intermolecular potentials. The element correlations that give the best performing model show striking similarities with the conventional periodic table of the elements, providing an inspiring example of how machine learning can rediscover, and generalize, intuitive concepts that constitute the foundations of chemistry.
△ Less
Submitted 25 July, 2019; v1 submitted 30 June, 2018;
originally announced July 2018.
-
Chemical Shifts in Molecular Solids by Machine Learning
Authors:
Federico M. Paruzzo,
Albert Hofstetter,
Félix Musil,
Sandip De,
Michele Ceriotti,
Lyndon Emsley
Abstract:
The calculation of chemical shifts in solids has enabled methods to determine crystal structures in powders. The dependence of chemical shifts on local atomic environments sets them among the most powerful tools for structure elucidation of powdered solids or amorphous materials. Unfortunately, this dependency comes with the cost of high accuracy first-principle calculations to qualitatively predi…
▽ More
The calculation of chemical shifts in solids has enabled methods to determine crystal structures in powders. The dependence of chemical shifts on local atomic environments sets them among the most powerful tools for structure elucidation of powdered solids or amorphous materials. Unfortunately, this dependency comes with the cost of high accuracy first-principle calculations to qualitatively predict chemical shifts in solids. Machine learning methods have recently emerged as a way to overcome the need for explicit high accuracy first-principle calculations. However, the vast chemical and combinatorial space spanned by molecular solids, together with the strong dependency of chemical shifts of atoms on their environment, poses a huge challenge for any machine learning method. Here we propose a machine learning method based on local environments to accurately predict chemical shifts of different molecular solids and of different polymorphs within DFT accuracy (RMSE of 0.49 ppm ( 1 H), 4.3ppm ( 13 C), 13.3 ppm ( 15 N), and 17.7 ppm ( 17 O) with $R^2$ of 0.97 for 1 H, 0.99 for 13 C, 0.99 for 15 N, and 0.99 for 17 O). We also demonstrate that the trained model is able to correctly determine, based on the match between experimentally-measured and ML-predicted shifts, structures of cocaine and the drug 4-[4-(2-adamantylcarbamoyl)-5-tert-butylpyrazol-1-yl]benzoic acid in an chemical shift based NMR crystallography approach.
△ Less
Submitted 29 May, 2018;
originally announced May 2018.
-
Mapping and Classifying Molecules from a High-Throughput Structural Database
Authors:
Sandip De,
Felix Musil,
Teresa Ingram,
Carsten Baldauf,
Michele Ceriotti
Abstract:
High-throughput computational materials design promises to greatly accelerate the process of discovering new materials and compounds, and of optimizing their properties. The large databases of structures and properties that result from computational searches, as well as the agglomeration of data of heterogeneous provenance leads to considerable challenges when it comes to navigating the database,…
▽ More
High-throughput computational materials design promises to greatly accelerate the process of discovering new materials and compounds, and of optimizing their properties. The large databases of structures and properties that result from computational searches, as well as the agglomeration of data of heterogeneous provenance leads to considerable challenges when it comes to navigating the database, representing its structure at a glance, understanding structure-property relations, eliminating duplicates and identifying inconsistencies. Here we present a case study, based on a data set of conformers of amino acids and dipeptides, of how machine-learning techniques can help addressing these issues. We will exploit a recently developed strategy to define a metric between structures, and use it as the basis of both clustering and dimensionality reduction techniques showing how these can help reveal structure-property relations, identify outliers and inconsistent structures, and rationalise how perturbations (e.g. binding of ions to the molecule) affect the stability of different conformers.
△ Less
Submitted 13 November, 2016;
originally announced November 2016.