DeepLNE++ leveraging knowledge distillation for accelerated multi-state path-like collective variables

Thorben Fröhlking School of Pharmaceutical Sciences, University of Geneva, Rue Michel Servet 1, 1206, Genève, Switzerland Institute of Pharmaceutical Sciences of Western Switzerland (ISPSO), University of Geneva, 1206, Genève, Switzerland Swiss Institute of Bioinformatics, University of Geneva, 1206, Genève, Switzerland    Valerio Rizzi School of Pharmaceutical Sciences, University of Geneva, Rue Michel Servet 1, 1206, Genève, Switzerland Institute of Pharmaceutical Sciences of Western Switzerland (ISPSO), University of Geneva, 1206, Genève, Switzerland Swiss Institute of Bioinformatics, University of Geneva, 1206, Genève, Switzerland    Simone Aureli School of Pharmaceutical Sciences, University of Geneva, Rue Michel Servet 1, 1206, Genève, Switzerland Institute of Pharmaceutical Sciences of Western Switzerland (ISPSO), University of Geneva, 1206, Genève, Switzerland Swiss Institute of Bioinformatics, University of Geneva, 1206, Genève, Switzerland    Francesco Luigi Gervasio francesco.gervasio@unige.ch School of Pharmaceutical Sciences, University of Geneva, Rue Michel Servet 1, 1206, Genève, Switzerland Institute of Pharmaceutical Sciences of Western Switzerland (ISPSO), University of Geneva, 1206, Genève, Switzerland Swiss Institute of Bioinformatics, University of Geneva, 1206, Genève, Switzerland Department of Chemistry, University College London, London, WC1E 6BT, United Kingdom
(July 5, 2024)
Abstract

Path-like collective variables can be very effective for accurately modeling complex biomolecular processes in molecular dynamics simulations. Recently, we introduced DeepLNE, a machine learning-based path-like CV that provides a progression variable s𝑠sitalic_s along the path as a non-linear combination of several descriptors, effectively approximating the reaction coordinate. However, DeepLNE is computationally expensive for realistic systems needing many descriptors and limited in its ability to handle multi-state reactions. Here we present DeepLNE++, which uses a knowledge distillation approach to significantly accelerate the evaluation of DeepLNE, making it feasible to compute free energy landscapes for large and complex biomolecular systems. In addition, DeepLNE++ encodes system-specific knowledge within a supervised multitasking framework, enhancing its versatility and effectiveness.

preprint: AIP/123-QED

Classical molecular dynamics (MD) simulations at the atomistic scale provide a powerful method for studying complex physical, chemical, and biological systems. Indeed, MD simulations can reveal mechanisms at spatial and temporal scales that are difficult to observe experimentally, acting as a computational microscope.Dror et al. (2012) Advances in computational power and force field accuracy have increased the capabilities of MD, even though a full description of intricate biological phenomena remains hard to achieve due to limitations in accessible timescales. To address this, enhanced sampling algorithms have been developed to allow the exploration of high-energy transitions and their associated free energy landscapes. Laio and Gervasio (2008); Valsson, Tiwary, and Parrinello (2016); Camilloni and Pietrucci (2018) Many of these algorithms use either collective variables (CVs) or paths connecting two-states A and B, each of which has its own advantages and limitations. Laio and Parrinello (2002); Bolhuis et al. (2002)

CV-based algorithms rely on the identification of slow degrees of freedom specific to the system under investigation and are limited to applying an external bias potential only on the empirically determined relevant ones. Pietrucci (2017) Path-based methods, on the other hand, can incorporate many degrees of freedom but are highly dependent on the definition of end states and effective path search algorithms. Path-collective variables (PATHCVs) combine these approaches and are often coupled with enhanced sampling algorithms to explore high free energy barriers and successfully reconstruct free energy profiles for complex systems. Branduardi, Gervasio, and Parrinello (2007)

It is worth emphasizing that the effectiveness of PATHCVs depends on the metric and milestone parameters chosen. Recent work has explored machine learning to optimise PATHCV metrics, but linear combinations of features can be sub-optimal for complex pathways, such as ligand binding to a protein. Hovan, Comitani, and Gervasio (2019) Non-linear combinations of descriptors better capture the relevant degrees of freedom at different stages of the pathway. Bonati, Rizzi, and Parrinello (2020); Bonati et al. (2023) Data-driven methods have also emerged for learning CVs, using techniques such as variational autoencoders and kernel ridge regression. Šípka, Erlebach, and Grajciar (2023); France-Lanord et al. (2024)

We recently introduced a new machine learning approach, DeepLNE (deep-locally non-linear-embedding), which refines the PATHCV method and overcomes existing limitations Fröhlking et al. (2024). DeepLNE creates a semi-automatic 1D description of the training data by combining local linear embedding Roweis and Saul (2000) (LLE) principles with PATHCV. The algorithm uses dimensionality reduction via artificial neural networks (ANNs), differentiable k-nearest neighbour selection, and an ANN-encoded 1D latent space. By focusing on system dynamics, DeepLNE forms a path-like CV anchored by training data, expanding the tools for CV definition and improving on previous methods with respect to available input space and functional flexibility.

The algorithm can be applied to reactive MD trajectories connecting metastable states and derives a PyTorch Paszke et al. (2017) model that accurately approximates the true reaction coordinate of the system s𝑠sitalic_s and its perpendicular distance z𝑧zitalic_z. Therefore, exporting DeepLNE as a CV and biasing it via state-of-the-art enhanced sampling methods such as ’On-the-fly Probability Enhanced Sampling’ (OPES) Invernizzi and Parrinello (2020, 2022) or OneOPES Rizzi et al. (2023) allows efficient estimation of the free energy surface (FES). However, DeepLNE is computationally demanding. When the underlying feature space is large and models are anchored to many training datapoints, the model evaluation via PyTorch can significantly impact the computational performance of the MD engine. Additionally, when multiple metastable states are aligned along the reaction path, the sequentiality of assigned path values on the 1D projection automatically derived by the DeepLNE algorithm can be different from the desired one. In this study, we show how DeepLNE can be improved by defining a multi-tasking objective function aligning the expectations when designing the path-like CVs (e.g. sequentiality of metastable states) with the model output and how the evaluation of DeepLNE CVs can be significantly accelerated by representing and exporting them as artificial neural networks (ANN). We achieve this within a knowledge distillation framework deriving DeepLNE student models for the path-like variables s𝑠sitalic_s and z𝑧zitalic_z. In the paper, we also document the best practices for training DeepLNE models including cross-validation and augmented objective functions allowing the use of labeled data in a supervised multi-tasking framework as well as the option to assign importance weights to individual input features. We demonstrate the extended and improved algorithm on the toy models: 3-state Müller-Brown-potential, alanine tripeptide, and the microRNA precursor pre-miR21. Importantly we benchmark the original PyTorch-based CV against the computationally faster DeepLNE student models obtained from knowledge distillation, showing that for the solvated pre-miR21 system composed of 31 nucleotides with a total of 45543 atoms we can achieve a 3 times speed-up.

I Methods

Before introducing the DeepLNE++ algorithm and its novelties, we summarise the original DeepLNE framework, highlighting its advantages and its limitations.

I.1 DeepLNE

The DeepLNE CV aims to represent high-dimensional datasets using a single dimension. It combines ideas from PATHCV and LLE within a neural network architecture which parameters are optimized using the mean square error between the original input vector 𝑿𝑿\bm{X}bold_italic_X and the reconstructed one 𝑿^bold-^𝑿\bm{\hat{X}}overbold_^ start_ARG bold_italic_X end_ARG for m𝑚mitalic_m training datapoints:

=1mi=1m|𝑿𝑿^|21𝑚superscriptsubscript𝑖1𝑚superscript𝑿bold-^𝑿2\mathcal{L}=\frac{1}{m}\sum_{i=1}^{m}|\bm{X}-\bm{\hat{X}}|^{2}caligraphic_L = divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT | bold_italic_X - overbold_^ start_ARG bold_italic_X end_ARG | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (1)

The architecture can be seen as a generalized autoencoder. An initial ANN reduces dimensionality. Then a continuous k-nearest-neighbor (k-NN) step is applied to each data point. The neighborhood information is compressed into a one-dimensional representation s𝑠sitalic_s. This representation is used for reconstruction via a decoder and for computing an accessory perpendicular distance z𝑧zitalic_z Since s𝑠sitalic_s is derived from the features of a data point’s neighbors it is robust to extrapolation. However, evaluating the model becomes slower with larger datasets due to memory requirements. The DeepLNE algorithm constructs variables s𝑠sitalic_s and z𝑧zitalic_z that approximate a transition path between states. It addresses several issues encountered in PATHCVs:

  • Automatic Metric Learning (d𝑑ditalic_d): DeepLNE learns a lower-dimensional metric to reduce degeneracy when identifying local neighborhoods.

  • Automatic Neighborhood Construction: Neighborhoods are constructed using a differentiable k-NN step.

  • Heterogeneous Features: Distances, angles, and contact maps can be combined in 𝑿𝑿\bm{X}bold_italic_X without predefined landmarks.

However, there are remaining limitations that we attempted to address in this study:

  • Sequentiality Preservation: The 1D DeepLNE variable s𝑠sitalic_s may not preserve desired sequentiality in multi-state systems.

  • Computationally Demanding Evaluation: On-the-fly evaluation for biasing MD simulations can impact performance.

I.2 DeepLNE++ in Knowledge Distillation framework

In the following, we describe all the key steps of our recommended strategy for training the DeepLNE++ CV (compare Fig. 1).

Refer to caption
Figure 1: Schematic of the steps within the knowledge distillation framework for DeepLNE++. The starting point is the generation of training datapoints to which labels are assigned to a priori known metastable states along the transition path of a biophysical system, here represented by the trajectory of a particle in the Müller-Brown potential. DeepLNE++ optimizes the model parameters using a Multi-tasking objective function \mathcal{L}caligraphic_L. Its components are the DeepLNE reconstruction loss, a cross-entropy term with respect to the assigned labels and a L2-regularization term applied on the model parameters. We select the DeepLNE model that minimized the \mathcal{L}caligraphic_L loss evaluated on the validation data left-out from the training dataset and consider it as the teacher model. We use it to perform short simulations biasing its path-like variables s𝑠sitalic_s and z𝑧zitalic_z and thereby expand the original training dataset. Next we train ANNs as DeepLNE student models to maximally reconstruct the teacher model output. We depict the weights and biases of the individual transformer components of the student models as colored circles. For these models we choose architectures with less free parameters than the teacher model in order to obtain models that can be evaluated at a lower computational cost. We train the DeepLNE student models via a reconstruction loss and choose the student models that minimize the loss evaluated on a validation dataset resulting in the most generalizing models. The ANN CVs obtained in this way are exported and used to bias MD simulations in enhanced sampling protocols. Reweighting the MD timeseries with respect to the path-like variables s𝑠sitalic_s and z𝑧zitalic_z allows estimating the underlying 2D FES of the system.

I.2.1 Training data generation

We use ’ratchet-and-pawl’ restraints of the Adiabatic Bias Molecular Dynamics (ABMD) method Camilloni, Broglia, and Tiana (2011) to perform directed enhanced sampling simulations connecting all the relevant states. Here we choose the spring constant κ𝜅\kappaitalic_κ such that we reach the target state within picoseconds of simulation time. κ𝜅\kappaitalic_κ can be seen as a hyperparameter of the DeepLNE algorithm, because it regulates the directionality and the input feature space the DeepLNE model is trained on. In the limit of high κ𝜅\kappaitalic_κ, we connect all states of interest within a short simulation time. However, since the training data will be obtained from proportionally out-of-equilibrium dynamics, the minimum transition path has to be recovered a posteriori from the trajectory after performing enhanced sampling simulation using the DeepLNE student models. We will show how this can be achieved when inspecting the FES with respect to the variables s𝑠sitalic_s and z𝑧zitalic_z in the following chapters. Consequently, we recommend not applying harmonic restraints on the z𝑧zitalic_z variable as long as the minimum free energy path between the states of interest is unknown. Additionally, we recommend to generate ABMD states-connecting trajectories along all possible regions of the accessible descriptor space 𝑿𝑿\bm{X}bold_italic_X. The more exhaustive is the exploration of the input descriptor space provided during the training of DeepLNE, the less extrapolation is required and unexpected behavior can be minimized. A novel key step in the DeepLNE knowledge distillation framework is the assignment of labels to supervise the model. Information should be provided in the form of labels whenever a region in the input descriptor space can be assigned to a metastable state or a transition state with high confidence. To minimize the computational cost of DeepLNE we use Farthest Point-Sampling (FPS)Goscinski et al. (2023) to select a maximally diverse subset of the sampled data, such that only the m𝑚mitalic_m frames are retained. The dataset is then split into training and validation data, with 80% of the data used for training and the remaining 20% is left out for validation.

I.2.2 Teacher model training

For all the DeepLNE teacher models in this study, we choose d=3𝑑3d=3italic_d = 3 for the embedded manifold coordinates 𝒙isubscript𝒙𝑖\bm{x}_{i}bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and k=3𝑘3k=3italic_k = 3 meaning we use 3 nearest-neighbors to construct the vector 𝒙ikNNsuperscriptsubscript𝒙𝑖kNN\bm{x}_{i}^{\mathrm{k-NN}}bold_italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_k - roman_NN end_POSTSUPERSCRIPT (compare chapter of DeepLNE on hyperparameter choices in Ref. Fröhlking et al., 2024). We scale the s𝑠sitalic_s variable using a sigmoid activation in the final encoder layer, restricting its values to the range of 0–1. As shown in Fig. 1 the network parameters are optimized using the mean square discrepancy between the original input and the decoded ones, also referred to as reconstruction loss term. The model is trained on a multi-tasking objective function that includes a cross-entropy term allowing to supervise the model via labeled data and an L2-regularization term penalizing large magnitudes in the model parameters p𝑝pitalic_p:

train,teach=subscript𝑡𝑟𝑎𝑖𝑛𝑡𝑒𝑎𝑐absent\displaystyle\mathcal{L}_{train,~{}teach}=caligraphic_L start_POSTSUBSCRIPT italic_t italic_r italic_a italic_i italic_n , italic_t italic_e italic_a italic_c italic_h end_POSTSUBSCRIPT = 1mi=1mj=1Dαj|𝐗ij𝐗^ij|21𝑚superscriptsubscript𝑖1𝑚superscriptsubscript𝑗1𝐷subscript𝛼𝑗superscriptsubscript𝐗𝑖𝑗subscript^𝐗𝑖𝑗2\displaystyle\frac{1}{m}\sum_{i=1}^{m}\sum_{j=1}^{D}\alpha_{j}|\mathbf{X}_{ij}% -\mathbf{\hat{X}}_{ij}|^{2}divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | bold_X start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT - over^ start_ARG bold_X end_ARG start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
+β(lilog(si)(1li)log(1si))𝛽subscript𝑙𝑖subscript𝑠𝑖1subscript𝑙𝑖1subscript𝑠𝑖\displaystyle+\beta\left(-l_{i}\log(s_{i})-(1-l_{i})\log(1-s_{i})\right)+ italic_β ( - italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT roman_log ( italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - ( 1 - italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) roman_log ( 1 - italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) )
+γj=1Ppj2𝛾superscriptsubscript𝑗1𝑃superscriptsubscript𝑝𝑗2\displaystyle+\gamma\sum_{j=1}^{P}p_{j}^{2}+ italic_γ ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (2)

Each term of the objective function has associated hyperparameters α𝛼\alphaitalic_α, β𝛽\betaitalic_β, and γ𝛾\gammaitalic_γ that are found by scanning a range of different choices respectively. While α𝛼\alphaitalic_α is a hyperparameter vector that tunes the importance of each individual input feature reconstruction, β𝛽\betaitalic_β and γ𝛾\gammaitalic_γ are scalar hyperparameters tuning the importance of the label reconstruction and the magnitude of the model parameters respectively. In order to obtain the most generalizing model we select the one that minimizes the multi-tasking reconstruction loss on the validation data:

val,teach=subscript𝑣𝑎𝑙𝑡𝑒𝑎𝑐absent\displaystyle\mathcal{L}_{val,~{}teach}=caligraphic_L start_POSTSUBSCRIPT italic_v italic_a italic_l , italic_t italic_e italic_a italic_c italic_h end_POSTSUBSCRIPT = 1mi=1mj=1Dαj|𝐗ij𝐗^ij|21𝑚superscriptsubscript𝑖1𝑚superscriptsubscript𝑗1𝐷subscript𝛼𝑗superscriptsubscript𝐗𝑖𝑗subscript^𝐗𝑖𝑗2\displaystyle\frac{1}{m}\sum_{i=1}^{m}\sum_{j=1}^{D}\alpha_{j}|\mathbf{X}_{ij}% -\mathbf{\hat{X}}_{ij}|^{2}divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | bold_X start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT - over^ start_ARG bold_X end_ARG start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
+β(lilog(si)(1li)log(1si))𝛽subscript𝑙𝑖subscript𝑠𝑖1subscript𝑙𝑖1subscript𝑠𝑖\displaystyle+\beta\left(-l_{i}\log(s_{i})-(1-l_{i})\log(1-s_{i})\right)+ italic_β ( - italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT roman_log ( italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - ( 1 - italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) roman_log ( 1 - italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) (3)

We use gradient descent using the ADAM optimizer with a learning rate of 103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT and 5000 epochs, using the machine learning library PyTorch Paszke et al. (2017).

I.2.3 Expanding training data

Since the student models will be more susceptible to overfitting, because the ANNs are not anchored to the training data as it is the case for the DeepLNE teacher model we recommend expanding the training data by performing short enhanced sampling simulations e.g. using OPES in exploration mode Invernizzi and Parrinello (2022) (OPES-EXPLORE) using the DeepLNE teacher model (in the order of ns). This step aims at increasing the robustness of the DeepLNE student models. To track the exploration of the descriptor space it can be instructive to plot the DeepLNE CVs along a few important physical descriptors or against the first principal components of the input features 𝑿𝑿\bm{X}bold_italic_X. Based on this inspection we empirically decide when the descriptor space has been explored sufficiently such that we are minimizing the regions in configurational space that need to be extrapolated by the DeepLNE student models. Ultimately the original training data are concatenated with the newly sampled datapoints. Also here we use FPS to select a maximally diverse subset of all available datapoints. We retain m𝑚mitalic_m frames and we split the data into training and validation set at a 80-20 ratio.

I.2.4 Student model training

As shown in Fig. 1 we project the DeepLNE teacher model variables s𝑠sitalic_s and z𝑧zitalic_z on the extended training dataset and train DeepLNE student models by minimizing the mean square discrepancy between the teacher model output and the student model output:

train,stud=subscript𝑡𝑟𝑎𝑖𝑛𝑠𝑡𝑢𝑑absent\displaystyle\mathcal{L}_{train,~{}stud}=caligraphic_L start_POSTSUBSCRIPT italic_t italic_r italic_a italic_i italic_n , italic_s italic_t italic_u italic_d end_POSTSUBSCRIPT = 1mi=1mSmoothL1(ξiDeepLNEξiStudent)1𝑚superscriptsubscript𝑖1𝑚SmoothL1superscriptsubscript𝜉𝑖DeepLNEsuperscriptsubscript𝜉𝑖Student\displaystyle\frac{1}{m}\sum_{i=1}^{m}\text{SmoothL1}(\xi_{i}^{\text{DeepLNE}}% -\xi_{i}^{\text{Student}})divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT SmoothL1 ( italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT DeepLNE end_POSTSUPERSCRIPT - italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT Student end_POSTSUPERSCRIPT )
+γj=1Ppj2𝛾superscriptsubscript𝑗1𝑃superscriptsubscript𝑝𝑗2\displaystyle+\gamma\sum_{j=1}^{P}p_{j}^{2}+ italic_γ ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (4)

with ξi{si,zi}subscript𝜉𝑖subscript𝑠𝑖subscript𝑧𝑖\xi_{i}\in\{s_{i},z_{i}\}italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ { italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } and the PyTorch loss function SmoothL1 with β=1.35𝛽1.35\beta=1.35italic_β = 1.35. The hyperparameter γ𝛾\gammaitalic_γ regulates the penalty applied based on the magnitude of the student model parameters p𝑝pitalic_p. The choice of SmoothL1 loss is motivated by its robustness against outliers and consequently reducing overfitting in trained models. As the best model, we choose the one that minimizes this objective function evaluated on the validation dataset:

val, stud=1mi=1mSmoothL1(ξiDeepLNEξiStudent)subscriptval, stud1𝑚superscriptsubscript𝑖1𝑚SmoothL1superscriptsubscript𝜉𝑖DeepLNEsuperscriptsubscript𝜉𝑖Student\mathcal{L}_{\text{val,~{}stud}}=\frac{1}{m}\sum_{i=1}^{m}\text{SmoothL1}(\xi_% {i}^{\text{DeepLNE}}-\xi_{i}^{\text{Student}})caligraphic_L start_POSTSUBSCRIPT val, stud end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_m end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT SmoothL1 ( italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT DeepLNE end_POSTSUPERSCRIPT - italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT Student end_POSTSUPERSCRIPT ) (5)

We use gradient descent using the ADAM optimizer with a learning rate of 103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT and up to 15000 epochs, using the machine learning library PyTorch Paszke et al. (2017). We recommend using a higher number of training epochs than in I.2.2 in order to obtain models with a lower validation error, which typically corresponds to models that overfit less and exhibit smooth extrapolation into unknown regions of the descriptor space 𝑿𝑿\bm{X}bold_italic_X.

I.2.5 Exporting the student models into PLUMED

The parameters of the trained DeepLNE student models are stored such that the models can be evaluated on-the-fly in MD simulations using the PLUMED plugin Tribello et al. (2014). We use a variation of the MULTI_ANN CV available at https://github.com/bussilab/plumed-multi-ann that provides computationally efficient evaluation of ANN as CVs. An implementation of the full knowledge distillation framework and usage as a tutorial are available at https://github.com/ThorbenF/DeepLNE2.

I.3 Interpretation of hyperparameters

In order to obtain the DeepLNE models that are most adequate for the biophysical system at hand we here focus on the choice and the interpretation of the hyperparameters that are part of the knowledge distillation framework.

I.3.1 Teacher model

The teacher model keeps the original DeepLNE hyperparameter for the continuous k-nearest-neighbors step that are the number of neighbors k𝑘kitalic_k and the sparsity of the selection matrix t𝑡titalic_t, which detailed description can be found in Ref. Fröhlking et al., 2024. We recommend using k=3𝑘3k=3italic_k = 3 and t=0.1𝑡0.1t=0.1italic_t = 0.1 for models with high locality and numerical stability. The z𝑧zitalic_z variable is also dependent on λ𝜆\lambdaitalic_λ which is not relevant for the training of the model and can be quickly adjusted without retraining the model. The effect of λ𝜆\lambdaitalic_λ is analogous to its role in the PATHCV. To decide the adequate value for all hyperparameters it can be helpful to visualize the resulting behavior of variables s𝑠sitalic_s and z𝑧zitalic_z in the descriptor space or the 2 first components of the PCA on the descriptor set (compare Ref. Fröhlking et al., 2024).

Beyond the DeepLNE hyperparameter in I.2.2 we introduced a new objective function for training the teacher model that contains hyperparameters α,β𝛼𝛽\alpha,\betaitalic_α , italic_β and γ𝛾\gammaitalic_γ, for which we have the following limiting cases:

  • α0𝛼0\alpha\rightarrow 0italic_α → 0: DeepLNE does not prioritize descriptor reconstruction.

  • α1𝛼1\alpha\rightarrow 1italic_α → 1: Full reconstruction of the selected descriptor from s𝑠sitalic_s.

  • β0𝛽0\beta\rightarrow 0italic_β → 0: Assigned labels are ignored.

  • β𝛽\beta\rightarrow\inftyitalic_β → ∞: Optimal matching of s𝑠sitalic_s with all labels.

  • γ0𝛾0\gamma\rightarrow 0italic_γ → 0: In this limit we obtain a model with maximal flexibility, which can lead to overfitting.

  • γ𝛾\gamma\rightarrow\inftyitalic_γ → ∞: Maximal penalty for large parameters, resulting in reduced model flexibility.

I.3.2 Student model

In I.2.4 we introduced a new objective function for the training of the student model that contains hyperparameters β𝛽\betaitalic_β and γ𝛾\gammaitalic_γ. In contrast to the teacher model’s objective function, which uses the hyperparameter α𝛼\alphaitalic_α to weight the importance of the input features, the student model’s objective function is missing α𝛼\alphaitalic_α. The motivation for this choice is that the student model can learn automatically and unsupervised which of the input descriptors 𝑿𝑿\bm{X}bold_italic_X are relevant for reconstructing the DeepLNE teacher model variables s𝑠sitalic_s and z𝑧zitalic_z. While γ𝛾\gammaitalic_γ behaves as just described above, we have the limiting cases for β𝛽\betaitalic_β:

  • β0𝛽0\beta\rightarrow 0italic_β → 0: the SmoothL1 loss converges to L1 loss decreasing the weight of the discrepancies between teacher and student model on the objective function.

  • β1𝛽1\beta\rightarrow 1italic_β → 1: the SmoothL1 objective becomes the L2 norm increasing the weight of the discrepancies between teacher and student model.

In this study, we assigned the same model specific hyperparameter choices to the training and the validation objective function.

I.4 Details on MD simulations and DeepLNE training

The DeepLNE knowledge distillation framework that we introduce here (see Fig. 1) aims at representing the DeepLNE CVs s𝑠sitalic_s and z𝑧zitalic_z as a computationally cheap ANNs that can be straightforwardly exported into PLUMED. For all the systems investigated in this paper, we followed the same protocol: (1) sampling of the transition of interest; (2) training the original DeepLNE CV on a set of input features as a teacher model; (3) performing enhanced sampling MD simulation using the teacher model to extend the training data; (4) training a student model for s𝑠sitalic_s and another student model for the z𝑧zitalic_z variable; (5) exporting the CVs to PLUMED and combining them with an enhanced sampling method for free energy estimation. In the following, we outline all the details of our recommended strategy. We test the DeepLNE knowledge distillation framework across the following applications: exploration of the 3-state Müller-Brown potential by a single particle, and sampling of the conformational space of the alanine tripeptide. Lastly, as an example of a real-world biological problem, we investigated the conformational change associated to the maturation of the microRNA’s pre-miR21 (see Fig. 2), a prominent oncological target whose conformational space and interactions with a novel inhibitor has been recently studied in Ref. Aureli et al., 2024.

Refer to caption
Figure 2: The DeepLNE CVs are tested on 3 toy models: (a) Müller-Brown potential used as a 3-state potential energy landscape for the simulation of a particle moving in two dimensions. (b) Structure of the well-studied biomolecule alanine tripeptide, a system that can be sufficiently described via its dihedral angles ϕ1subscriptitalic-ϕ1\phi_{1}italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, ϕ2subscriptitalic-ϕ2\phi_{2}italic_ϕ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, ϕ3subscriptitalic-ϕ3\phi_{3}italic_ϕ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT and ψ1subscript𝜓1\psi_{1}italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, ψ2subscript𝜓2\psi_{2}italic_ψ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, ψ3subscript𝜓3\psi_{3}italic_ψ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT. (c) Structure of pre-miR21 composed of 31 nucleotides. The RNA is coloured with respect to the type of nucleotide: Guanine-purple, Adenine-light crimson, Cytosine-orange, Uracil-pink, G5-white, C3-mauve. We show a structure from G22 to C54 where the Adenine nucleobase A29 (highlighted via ’VDW’ style) is integrated within the helical structure forming hydrogen bonds with the surrounding nucleobases (highlighted via ’licorice’ style), rather than extending out towards the solvent.

I.4.1 Müller-Brown-Potential

The simulations of a particle in the Müller-Brown potential are performed using the simple Langevin dynamics code contained in the VES module of PLUMED Valsson and Parrinello (2014). The particle is moving in two dimensions under the action of the 3-state potential depicted in Fig. 2 built out of a sum of Gaussians. We start by generating training data concatenating 2 ABMD simulations biasing the y-coordinate with a spring constant of κ=30,80𝜅3080\kappa=30,80italic_κ = 30 , 80 kBTsubscript𝑘𝐵𝑇k_{B}Titalic_k start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT italic_T respectively in order to connect the 3 metastable states. The DeepLNE CVs s𝑠sitalic_s and z𝑧zitalic_z are trained using the x𝑥xitalic_x and y𝑦yitalic_y position of the particle as input features 𝑿𝑿\bm{X}bold_italic_X also assigning labels to the 3 metastable states. We train the DeepLNE model with hyperparameters k=3𝑘3k=3italic_k = 3, t=0.1𝑡0.1t=0.1italic_t = 0.1, β=0.1𝛽0.1\beta=0.1italic_β = 0.1, γ=1e6𝛾1superscript𝑒6\gamma=1e^{-6}italic_γ = 1 italic_e start_POSTSUPERSCRIPT - 6 end_POSTSUPERSCRIPT setting α=1𝛼1\alpha=1italic_α = 1 for both input variables. The teacher model is then used to perform an exploration simulation to expand the training dataset. We deposit bias along the DeepLNE variable s𝑠sitalic_s via the OPES-EXPLORE engine. We reduce the number of training datapoints to m=1600𝑚1600m=1600italic_m = 1600 applying FPS. The DeepLNE student models for s𝑠sitalic_s and z𝑧zitalic_z are subsequently trained on this expanded training dataset choosing γ=1e4𝛾1superscript𝑒4\gamma=1e^{-4}italic_γ = 1 italic_e start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT. Finally we export the DeepLNE student models as CVs in PLUMED in order to perform free-energy estimation by depositing bias on s𝑠sitalic_s via OPES and applying a half parabolic harmonic constraint on z𝑧zitalic_z with a cutoff value of 0.1 (UWALL) and κ=2000kBT𝜅2000subscript𝑘𝐵𝑇\kappa=2000~{}k_{B}Titalic_κ = 2000 italic_k start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT italic_T. We set a bias deposition rate (PACE) of 200 steps and a barrier parameter equal to 15 kJ/mol. We estimate statistical errors on the FES via block analysis using 10 blocks.

I.4.2 Alanine Tripeptide

For the extensively studied FES of alanine tripeptide, ϕ1subscriptitalic-ϕ1\phi_{1}italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and ϕ2subscriptitalic-ϕ2\phi_{2}italic_ϕ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT are commonly used to describe the metastable states of the system. However there are the additional dihedral angles ϕ3subscriptitalic-ϕ3\phi_{3}italic_ϕ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT, ψ1subscript𝜓1\psi_{1}italic_ψ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, ψ2subscript𝜓2\psi_{2}italic_ψ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and ψ3subscript𝜓3\psi_{3}italic_ψ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT leading to at least 6 available dihedral angle descriptors for this system (compare Fig. 2 (b)). In this study, we investigate the transition between the 3 metastable states visible in the ϕ1subscriptitalic-ϕ1\phi_{1}italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, ϕ2subscriptitalic-ϕ2\phi_{2}italic_ϕ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT descriptor space, namely the configuration close to ϕ1,ϕ2=1,1formulae-sequencesubscriptitalic-ϕ1subscriptitalic-ϕ211\phi_{1},\phi_{2}=1,-1italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_ϕ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 1 , - 1, ϕ1,ϕ2=1,1formulae-sequencesubscriptitalic-ϕ1subscriptitalic-ϕ211\phi_{1},\phi_{2}=-1,-1italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_ϕ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = - 1 , - 1 and ϕ1,ϕ2=1,1formulae-sequencesubscriptitalic-ϕ1subscriptitalic-ϕ211\phi_{1},\phi_{2}=1,1italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_ϕ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 1 , 1. We generate training data concatenating ABMD simulations biasing ϕ1,ϕ2,ϕ3subscriptitalic-ϕ1subscriptitalic-ϕ2subscriptitalic-ϕ3\phi_{1},\phi_{2},\phi_{3}italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_ϕ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_ϕ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT with a spring constant of κ=100𝜅100\kappa=100italic_κ = 100 kJ/mol respectively in order to connect the 3 metastable states. To show the advantages of the selective reconstruction loss introduced in Eq. I.2.2 we use all 6 dihedral angles as initial features setting α=0𝛼0\alpha=0italic_α = 0 for ϕ3subscriptitalic-ϕ3\phi_{3}italic_ϕ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT and the ψ𝜓\psiitalic_ψ dihedrals. To save computational costs we use the FPS tool to reduce the number of training datapoints (m=1000𝑚1000m=1000italic_m = 1000) for the teacher model and (m=1500𝑚1500m=1500italic_m = 1500) for the student models. The hyperparameters for the teacher model are k=3𝑘3k=3italic_k = 3, t=0.1𝑡0.1t=0.1italic_t = 0.1, β=1𝛽1\beta=1italic_β = 1, γ=1e4𝛾1superscript𝑒4\gamma=1e^{-4}italic_γ = 1 italic_e start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT. After training we use the s𝑠sitalic_s variable as a CV to bias a 5 ns long simulation with OPES-EXPLORE. We choose a PACE of 500 steps and a barrier parameter equal to 80 kJ/mol. We apply a harmonic constraint with a cutoff value of 0.2 and κ=500𝜅500\kappa=500italic_κ = 500 kJ/mol (UWALL) as a function of the z𝑧zitalic_z variable. We add the newly sampled datapoints to expand the training dataset and train the DeepLNE student models for s𝑠sitalic_s and z𝑧zitalic_z setting γ=1e6𝛾1superscript𝑒6\gamma=1e^{-6}italic_γ = 1 italic_e start_POSTSUPERSCRIPT - 6 end_POSTSUPERSCRIPT. Then, we proceed by using the student model s𝑠sitalic_s variable as a CV to bias 200 ns of MD simulation with OPES in order to estimate the free-energy of the system. We set a PACE of 1000 steps and a barrier parameter equal to 80 kJ/mol. Again, a harmonic constraint with a cutoff value of 0.2 and κ=5000𝜅5000\kappa=5000italic_κ = 5000 kJ/mol (UWALL) is applied to the z𝑧zitalic_z variable. All simulations are carried out in vacuo with the DES-Amber ff. Piana et al. (2020). We estimate statistical errors on the FES using 10 blocks of length 20 ns.

I.4.3 Pre-miR21

As a challenging system with a high degree of configurational complexity, we choose the RNA strand pre-miR21. This system consists of 31 nucleotides containing the physiologically relevant Adenosine nucleobase at 5’-3’ position 29 (A29) that can perform rotations between a ’stacked-in’ state (i.e. A29 pointing towards the paired RNA bases) (see Fig. 2) and a ’bulged-out’ state, in which A29 is translated in the bulk solution. Recently, pre-miR21 has been studied using extensive enhanced sampling MD simulations Aureli et al. (2024) using the multi-replica variant of OPES, i.e. OneOPES, where 8 concurrent replicas of the system are simulated under the effect of an increasing thermal ramp. This study uses the FES obtained by a simulation with a length of similar-to\sim400 ns per replica as a reference. Moreover, in the present study, we keep the simulation protocol identical to Ref. Aureli et al., 2024, however we perform only a single enhanced sampling MD simulation (using 1 replica). To sample the transition from state A (A29 ’stacked-in’) to state B (A29 ’bulged-out’) we use ABMD simulations biasing 3 different CVs simultaneously: 1) a primary contact map built using the heavy atom distances between the nucleobase A29 and the proximal nucleobases G45, C46 with cutoff 5 and 4 ÅÅ\mathrm{\SIUnitSymbolAngstrom}roman_Å respectively (CMAP1); 2) a contact map constructed using the heavy atom distances between the nucleobase A29 and the proximal nucleobases C28, G30 with cutoff 4 ÅÅ\mathrm{\SIUnitSymbolAngstrom}roman_Å (CMAP2); 3) a contact map built using the heavy atom distances between the nucleobases C28, G30 with cutoff 8.5 ÅÅ\mathrm{\SIUnitSymbolAngstrom}roman_Å (CMAP3). We start multiple ABMD runs from both states at CMAP3 values that span the entire possible range to allow maximum variation in this descriptor space direction, terminating the simulations upon reaching the respective opposite state. We reduce the number of training datapoints to m=1000𝑚1000m=1000italic_m = 1000 via FPS and choose the same distance metrics used for the ABMD simulations as input features in the DeepLNE training. The teacher model is trained with hyperparameters k=3𝑘3k=3italic_k = 3, t=0.1𝑡0.1t=0.1italic_t = 0.1, β=10𝛽10\beta=10italic_β = 10, γ=1e8𝛾1superscript𝑒8\gamma=1e^{-8}italic_γ = 1 italic_e start_POSTSUPERSCRIPT - 8 end_POSTSUPERSCRIPT. In the enhanced sampling simulation using the DeepLNE teacher model to expand the training data for the student models, we use the s𝑠sitalic_s and z𝑧zitalic_z variables to deposit bias with OPES-EXPLORE for a total simulation time of 20 ns. We choose a PACE of 500 steps, bandwidths 0.1 and 1 respectively, and a barrier parameter equal to 60 kJ/mol. We concatenate all available sampled trajectories, subsample a lower number of datapoints via FPS (m=1500𝑚1500m=1500italic_m = 1500), and continue by training the DeepLNE student model s𝑠sitalic_s setting γ=1e6𝛾1superscript𝑒6\gamma=1e^{-6}italic_γ = 1 italic_e start_POSTSUPERSCRIPT - 6 end_POSTSUPERSCRIPT. After completing the training and exporting the student model into PLUMED we use s𝑠sitalic_s as a CV to deposit bias via OPES. We set a PACE of 5000 steps and a barrier parameter equal to 35 kJ/mol. With this setup, a trajectory of 2 μs𝜇𝑠\mu sitalic_μ italic_s has been produced. The free energies were calculated by integrating the probability of being in respective microstates i𝑖iitalic_i of the CV by F=kBTlog(i,CVp(xi))𝐹subscript𝑘𝐵𝑇subscript𝑖CV𝑝subscript𝑥𝑖F=-k_{B}T\log(\sum_{i,\text{CV}}p(x_{i}))italic_F = - italic_k start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT italic_T roman_log ( ∑ start_POSTSUBSCRIPT italic_i , CV end_POSTSUBSCRIPT italic_p ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ). Statistical errors of the FES were estimated using 10 blocks of 200 ns. For the reference FES, we choose 10 blocks of 23 ns skipping the first 30 ns of the OneOPES simulation.

Simulations of alanine tripeptide and pre-miR21 are performed using the GROMACS 2022.5 engine Abraham et al. (2015) patched with the PLUMED 2.9 plugin Tribello et al. (2014) with the Hamiltonian replica exchange algorithm. Bussi (2014)

II Results

II.1 Particle in Müller-Brown-Potential

Refer to caption
Figure 3: Results of training DeepLNE for a particle simulation in the 3-state Müller-Brown-potential. (a) Training datapoints for DeepLNE, obtained by concatenating 2 ABMD (biasing y-coordinate with a spring constant of κ=30,80𝜅3080\kappa=30,80italic_κ = 30 , 80 kBTsubscript𝑘𝐵𝑇k_{B}Titalic_k start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT italic_T respectively. The 2D coordinate space is colored based on the trained DeepLNE s𝑠sitalic_s variable. Also we superimpose the 𝑿^^𝑿\hat{\bm{X}}over^ start_ARG bold_italic_X end_ARG of the DeepLNE model that is used to compute the z𝑧zitalic_z variable. (b) Sampled configuration during the biased simulation using OPES-EXPLORE applied on the trained PyTorch DeepLNE CVs and a harmonic constraint applied on z𝑧zitalic_z. The color of the datapoints correspond to the DeepLNE s𝑠sitalic_s variable showing path-like behavior. These datapoints are subsequently used to train the DeepLNE student models. (c) Free energy landscape with respect to the s𝑠sitalic_s variable of the student model. We compare the FES obtained via OPES biasing the student model s𝑠sitalic_s and applying harmonic restraint on student model z𝑧zitalic_z with the FES of an OPES simulation biasing x and y and subsequently reweighting with respect to both DeepLNE student model variables s𝑠sitalic_s and z𝑧zitalic_z.

In Fig. 3 we collected the results of some steps along the proposed DeepLNE knowledge distillation algorithm. In Fig. SI 1 the remaining intermediate steps can be found. The Müller-Brown-Potential used in this study encompasses 3 states. To show the effectiveness of the DeepLNE model we choose a relatively minimal architecture with 1 hidden layer and 3 nodes for the dimensionality reduction d=3𝑑3d=3italic_d = 3 and 2 hidden layers with 9 and 3 hidden nodes for both neighborhood compression and decoder network and train it on a concatenation of only 2 ABMD trajectories connecting the respective end states with the intermediate state. After 5000 training epochs we can project the DeepLNE teacher model on the input feature space 𝑿𝑿\bm{X}bold_italic_X as shown in Fig. 3 (a). We project the behavior of the s𝑠sitalic_s variable of and can appreciate that the choice of low k=3 leads to a DeepLNE model with high locality, which can be seen especially for the regions in phase space with minimal and maximal s𝑠sitalic_s values. The smooth extrapolation behavior of the DeepLNE model originates from the continuous k-nearest neighbor step. To achieve a similar extrapolation in the DeepLNE student model we proceed by performing an OPES-EXPLORE simulation biasing s𝑠sitalic_s and thereby expanding the training datapoints. The results are shown in Fig. 3 (b) where we find the entire range of s𝑠sitalic_s values explored and importantly a broader range of z𝑧zitalic_z values is explored too. For both DeepLNE student models s𝑠sitalic_s and z𝑧zitalic_z we choose an architecture with 3 hidden layers and 8 nodes each. Training the DeepLNE student models on the concatenation of the data 𝑿𝑿\bm{X}bold_italic_X shown in Fig. 3 (a) and the new sampled data in (b) and subsequently biasing s𝑠sitalic_s and simultaneously applying a harmonic restraint on z𝑧zitalic_z we obtain the 1D FES along s𝑠sitalic_s shown in Fig. 3 (c). Importantly the estimated free energy is identical to a reweighted trajectory biasing both Cartesian coordinates x𝑥xitalic_x and y𝑦yitalic_y using the same OPES setting as used when biasing the path-like DeepLNE CVs. In Fig. SI 1 (a) we give additional insight into the labels assigned for the supervised multi-tasking framework of DeepLNE. Each datapoint in known regions of the descriptor space is labeled allowing for tailoring the directionality learned by the model during training. In Fig. SI 1 (b) we projected the DeepLNE variable z𝑧zitalic_z on the descriptor space as done for s𝑠sitalic_s in Fig. 3 (a). As expected we find the reconstructed data 𝑿^bold-^𝑿\bm{\hat{X}}overbold_^ start_ARG bold_italic_X end_ARG in the center of the training datapoints and accordingly the computed z𝑧zitalic_z variable describes a reaction channel connecting all 3 system states. In Fig. SI 1 (d), we go beyond the 1D projection of the FES and use the additional information available from the z𝑧zitalic_z variable to create the 2D FES obtained from the same enhanced sampling simulation run that resulted in Fig. 3 (c). The 2D FES projection allows for additional a posteriori analysis revealing transition regions and more detailed location as well as the extend of the metastable states. This will be especially useful when attempting to describe systems of increasing degree of complexity.

II.2 Alanine Tripeptide

Refer to caption
Figure 4: Results of training DeepLNE for alanine tripeptide. (a) Training datapoints for DeepLNE, obtained by concatenating ABMD (biasing ϕ1,ϕ2,ϕ3italic-ϕ1italic-ϕ2italic-ϕ3\phi 1,\phi 2,\phi 3italic_ϕ 1 , italic_ϕ 2 , italic_ϕ 3 with a spring constant of κ=100𝜅100\kappa=100italic_κ = 100 kJ/mol respectively. The 2D coordinate space with respect to ϕ1italic-ϕ1\phi 1italic_ϕ 1 and ϕ2italic-ϕ2\phi 2italic_ϕ 2 is colored based on the trained DeepLNE s𝑠sitalic_s variable (we set all remaining ϕitalic-ϕ\phiitalic_ϕ and ψ𝜓\psiitalic_ψ input variables to 00). Also we superimpose the 𝑿^^𝑿\hat{\bm{X}}over^ start_ARG bold_italic_X end_ARG of the DeepLNE model. (b) Sampled configuration during the biased simulation using OPES-EXPLORE applied on the trained PyTorch DeepLNE CVs and a harmonic constraint applied on z𝑧zitalic_z. The color of the datapoints correspond to the DeepLNE s𝑠sitalic_s variable showing path-like behavior. These datapoints are subsequently used to train the DeepLNE student models. (c) Free energy landscape with respect to the student model s𝑠sitalic_s variable for an OPES biased simulation using the student model s𝑠sitalic_s and harmonic restraint on student model z𝑧zitalic_z variable as well as an OPES simulation biasing ϕ1italic-ϕ1\phi 1italic_ϕ 1 and ϕ2italic-ϕ2\phi 2italic_ϕ 2 and the same harmonic restraint on student model z𝑧zitalic_z variable.

The alanine tripeptide system has been widely studied showing that it can be sufficiently described via its 6 dihedral angles ϕitalic-ϕ\phiitalic_ϕ and ψ𝜓\psiitalic_ψ. In this study, even though alanine tripeptide has more states that constitute the entire configurational space, we treat it as a 3 state system considering the configurations ϕ1,ϕ2=1,1formulae-sequencesubscriptitalic-ϕ1subscriptitalic-ϕ211\phi_{1},\phi_{2}=1,-1italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_ϕ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 1 , - 1, ϕ1,ϕ2=1,1formulae-sequencesubscriptitalic-ϕ1subscriptitalic-ϕ211\phi_{1},\phi_{2}=-1,-1italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_ϕ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = - 1 , - 1 and ϕ1,ϕ2=1,1formulae-sequencesubscriptitalic-ϕ1subscriptitalic-ϕ211\phi_{1},\phi_{2}=-1,1italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_ϕ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = - 1 , 1 as the metastable states. Additionally we consider a specific transition path that the system has to follow. In Fig. 4 (a) we show the training data obtained from multiple ABMD simulations describing this reaction path. The trajectories are connecting all 3 states and we proceed by training a DeepLNE teacher model. The trained model takes as input all 6 ϕitalic-ϕ\phiitalic_ϕ and ψ𝜓\psiitalic_ψ angles and its s𝑠sitalic_s variable is projected in along ϕ1subscriptitalic-ϕ1\phi_{1}italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and ϕ2subscriptitalic-ϕ2\phi_{2}italic_ϕ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT by setting the ψ𝜓\psiitalic_ψ angles to 0 and ϕ3=1subscriptitalic-ϕ31\phi_{3}=-1italic_ϕ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT = - 1. We choose an architecture with 1 hidden layer and 3 nodes for the dimensionality reduction d=3𝑑3d=3italic_d = 3 and 1 hidden layers with 2 hidden nodes for both neighborhood compression and decoder network. Also we select k=3 to obtain a DeepLNE model with high locality, which is reflected in the regions in dihedral space with minimal and maximal s𝑠sitalic_s value (compare Fig. 4 (a)). The DeepLNE teacher is capable of smooth extrapolation originating from the continuous k-nearest neighbor step. We proceed by expanding the training data coupling the teacher model CV to OPES-EXPLORE simulation biasing s𝑠sitalic_s and applying a harmonic restraint on z𝑧zitalic_z in order to exclusively sample the envisioned reaction path. The results are shown in Fig. 4 (b) where we find the entire range of s𝑠sitalic_s values explored and importantly obtain a more densely sampled descriptor space 𝑿𝑿\bm{X}bold_italic_X along the path. We train the DeepLNE student models on the concatenation of the data shown in Fig. 4 (a) and the newly sampled data in (b). Because of the functional complexity differences between s𝑠sitalic_s and z𝑧zitalic_z in this system the DeepLNE s𝑠sitalic_s variable can be approximated with ANNs of lower complexity compared to the DeepLNE z𝑧zitalic_z variable. Accordingly for the student model s𝑠sitalic_s we choose a smaller architectures with 1 hidden layer and 3 nodes compared to the z𝑧zitalic_z student model with 2 hidden layers and 6 and 3 nodes each. We can appreciate that the training datapoints describe the transition regions between the metastable states at ϕ1,ϕ2=0,1formulae-sequencesubscriptitalic-ϕ1subscriptitalic-ϕ201\phi_{1},\phi_{2}=0,-1italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_ϕ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 0 , - 1 and ϕ1,ϕ2=1,0formulae-sequencesubscriptitalic-ϕ1subscriptitalic-ϕ210\phi_{1},\phi_{2}=-1,0italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_ϕ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = - 1 , 0 respectively. This is a specific choice for the transition path, but not the only relevant one considering the entire available configurational space for alanine tripeptide. Since dihedral angles are periodic the 3 metastable states that the s𝑠sitalic_s variable connects can be reached by crossing also other transition regions visible in (a) and (b). This has the consequence that when we want to compare FES estimations obtained from DeepLNE biased simulations to a reference simulation that can sample the entire configurational space, we need to restrict the latter such that both enhanced sampling simulations sample the same reaction path, which can be achieved by applying harmonic restraints on z𝑧zitalic_z in both cases. Accordingly we proceed by training the student models for s𝑠sitalic_s and z𝑧zitalic_z on the expanded training dataset and perform an OPES simulation biasing s𝑠sitalic_s and restraining the system along z𝑧zitalic_z to stay within the reaction tube constituted by the variables s𝑠sitalic_s and z𝑧zitalic_z. In Fig. 4 (c) we then compare the FES obtained from a DeepLNE biased run with a reference run biasing all 3 ϕitalic-ϕ\phiitalic_ϕ dihedral angles. We can appreciate that both simulations arrive at the same estimation of the FES in the reaction channel subspace defined via the DeepLNE student model variable z𝑧zitalic_z. In Fig. SI 2 (a) we detail how the training data were labeled in order to supervise DeepLNE while constructing the path-like variables s𝑠sitalic_s and z𝑧zitalic_z. Each datapoint in the 3 states within the descriptor space is labeled allowing for the desired directionality of s𝑠sitalic_s. In Fig. SI 2 (b) we projected the DeepLNE variable z𝑧zitalic_z on the descriptor space as done for s𝑠sitalic_s in Fig. 4 (a). The reconstructed data 𝑿^bold-^𝑿\bm{\hat{X}}overbold_^ start_ARG bold_italic_X end_ARG and the corresponding z𝑧zitalic_z variable that is a function of 𝑿^bold-^𝑿\bm{\hat{X}}overbold_^ start_ARG bold_italic_X end_ARG describes a reaction channel connecting all 3 system states that is a subspace of the entire configurational space. In difference to the toy model investigated in II.1 there are states with significant probabilistic weight such that comparison to reference simulations need to be performed within the same reaction channel created by s𝑠sitalic_s and z𝑧zitalic_z. Respecting this condition the FES can be estimated in the 2 DeepLNE dimensions s𝑠sitalic_s and z𝑧zitalic_z as shown in Fig. SI 2 (d). We can appreciate that all metastable states as well as the transition regions appear as regions of orthogonal orientation with respect to the s𝑠sitalic_s variable indicating that the created reaction channel by the combination of s𝑠sitalic_s and z𝑧zitalic_z crosses the transition states with high orthogonality, which is indicative of a CV that is approximating well the true reaction coordinate of the system (compare committor analysis performed in Fröhlking et al. (2024)).

II.3 Pre-miR21

Refer to caption
Figure 5: Results of training DeepLNE for pre-miR21. (a) Free energy landscape with respect to the HLDA variable used by Ref. Aureli et al., 2024 comparing to a OPES simulation using the student model s𝑠sitalic_s to deposit bias. The nucleobase A29 is ’stacked-in’ the helix at high values of HLDA and ’bulged-out’ towards the solvent at low HLDA values. The reference study used OneOPES with 8 replicas and we evaluate the FES obtained after a total simulation time of more than 400 ns per replica. In this study we perform a single OPES simulation instead biasing the student model s𝑠sitalic_s variable for 2 μs𝜇𝑠\mu sitalic_μ italic_s. (b) Free energy landscape with respect to the student model s𝑠sitalic_s and z𝑧zitalic_z variable showing 3 metastable states of the nucleobase A29 being ’stacked-in’ at values of s𝑠sitalic_s below 0.7 and ’bulged-out’ at higher s𝑠sitalic_s values (compare Fig. SI 4 for the centroids associated to the 3 metastable states identified here). This description of the system along the DeepLNE variables also reveals the transition state regions along the reaction path.

The microRNA pre-miR21 is a system of high configurational complexity, namely the number of available descriptors, significant degrees of freedom, and metastable states is difficult to define a priori. While the challenge of reducing the descriptor dimensionality can be addressed using DeepLNE by transforming the high dimensional input feature space into a single CV s𝑠sitalic_s, the latter challenge can only be overcome by identifying a sub-set of the whole configurational space. Pre-miR21 has been recently studied with respect to processes for which base-flipping of the A29 is of relevance Aureli et al. (2024). Following the previous paper, the current study focuses on the same configurational re-arrangement. We note that this is an important first step that decides all the following analysis and results. As described in section I.4.3, we selected three CMAPS as input features for the DeepLNE teacher model. These CMAPS describe both the distance between A29 and its neighboring nucleobases in the base-paired state and the distance between the stacked nucleobases above and below A29. Multiple ABMD simulations make up the training data distributed maximally diverse in the descriptor space. As shown in Fig. SI 3 (a), we make use of the multi-tasking framework assigning labels to both end states known from the previous study as well as an ad hoc defined intermediate that is located between the end states when projecting the states in the input feature space 𝑿𝑿\bm{X}bold_italic_X. We choose an architecture with 1 hidden layer and 3 nodes for the dimensionality reduction d=3𝑑3d=3italic_d = 3 and 1 hidden layer with 3 nodes for both neighborhood compression and decoder network. Also, we select k=3 to obtain a DeepLNE model with high locality. We proceed by training the DeepLNE teacher on the labeled training data, exporting it into PLUMED, and performing OPES-EXPLORE simulation biasing exclusively the s𝑠sitalic_s variable. Combining the new datapoints (see Fig. SI 3 for (b) DeepLNE s𝑠sitalic_s variable and (c) DeepLNE z𝑧zitalic_z variable) sampled with the already available ABMD trajectories the DeepLNE student models are trained on reconstructing the projection of the DeepLNE teacher on those data. As demonstrated in alanine tripeptide we choose a smaller architecture with 1 hidden layer and 3 nodes for the student model s𝑠sitalic_s and a slightly more flexible architecture for the z𝑧zitalic_z student model with 2 hidden layers and 3 nodes each. In Fig. 5 we collected the results of the OPES simulation performed using the trained DeepLNE student model biasing the variable s𝑠sitalic_s. In Fig. 5 (a) we compare the FES obtained in this study with the one previously obtained in Ref. Aureli et al., 2024 biasing a harmonic linear discriminant analysis (HLDA) CV (i.e., a weighted linear combination of 10 inter-nucleobases distances). We project the FES obtained from DeepLNE biased simulation along the same CV showing that it converges to the same FES estimation. We note that while the previous study used a total simulation time of over 3 μs𝜇𝑠\mu sitalic_μ italic_s using multiple replicas we obtained the same result after 2 μs𝜇𝑠\mu sitalic_μ italic_s in a single MD realization. Additionally in Fig. 5 (b) we investigate the underlying dynamical rearrangement of A29 in more detail by projecting the FES along both DeepLNE student model variables s𝑠sitalic_s and z𝑧zitalic_z. We can appreciate that while the HLDA CV used in the previous study can distinguish the 2 metastable end states ’stacked-in’ and ’bulged-out’, the DeepLNE variables s𝑠sitalic_s and z𝑧zitalic_z reveal 3 metastable states as well as the interconnecting transition regions. In Fig. SI 4 we extracted the centroids of the structural ensemble associated with the 3 metastable states that can be identified in Fig. 5 (b). While the end states are the previously identified conformations with A29 being ’stacked-in’ (s𝑠sitalic_s<0.2) or ’bulged-out’ respectively (s𝑠sitalic_s>0.9) and the intermediate metastable state corresponds to the approaching of A29 into the RNA helix, with A29 starting to form the necessary contacts to achieve the fully ’stacked-in’ state.

II.4 Performance Benchmarks

To quantify the computational speed-up that can be achieved by applying the knowledge distillation framework introduced in this study we compare the MD simulation performance when biasing the DeepLNE teacher model with DeepLNE student models of increasing number of parameters. The performances of all the models are evaluated using an AMD Ryzen 7950X CPU and a Nvidia RTX3080 GPU. Errors are estimated on 3 repetitions of each benchmark simulation. In Fig. 5 we collected the results of the average DeepLNE student model performance benchmark measure in ns/day for alanine tripeptide (42 atoms) and the solvated RNA pre-miR21 (45543 atoms). The displayed values should be compared to the respective PyTorch-based CV teacher model performance of 201 ns/day for alanine tripeptide and 119 ns/day for pre-miR21 corresponding to speed-up factors of up to 20 and 3 respectively when using the student models. Importantly we can appreciate that across more than 4 orders of magnitude in student model complexity the performance remains similar. While a student model for alanine tripeptide with 16 parameters leads to an average performance 3959 ns/day and a more complex model with 20000 parameters still achieves 3774 ns/day. A student model for pre-miR21 with 16 parameters results in an average performance 329 ns/day while a model with 20000 parameters arrives at 325 ns/day. These results show that DeepLNE student models offer the potential to approximate and bias functions of increasing complexity while achieving similar MD performance.

Refer to caption
Figure 6: Computational performance of DeepLNE student models s comparing increasing complexity of the architectures, resulting in increasing number of model parameters. For alanine tripeptide in vacuo and the pre-miR21 RNA system investigated the performance measure in ns of simulation time per day remains constant with respect to increased model complexities. As reference, we note that the corresponding PyTorch teacher model for alanine tripeptide allows 201 ns/day while for pre-miR21 119 ns/day can be achieved. Errors are increasing with increasing model parameters where maximal errors on the performance estimates are 388 ns/day for alanine tripeptide and 6 ns/day for pre-miR21.

III Discussion

In this study, we propose the DeepLNE++ strategy to construct a computationally fast DeepLNE CV using a knowledge distillation approach. The algorithm selects the most generalizing model at each step, supervising its directionality via the assignment of labels to the training data and representing the path-like DeepLNE variables s𝑠sitalic_s and z𝑧zitalic_z using ANN student models. The new algorithm can be used to capture and especially accelerate rare events in large biophysical systems with reduced computational overhead. In multiple systems, we have shown how this procedure allows exporting computationally efficient CVs in MD software and maintaining its capability to approximate well the ideal reaction coordinate. Consequently, we could demonstrate that the DeepLNE++ CVs can be effectively used to improve sampling when coupled to enhanced sampling methods such as OPES without multi-replica approaches.

Our methodology starts by obtaining reactive trajectories that cover the descriptor space in the most diverse manner possible encompassing the transitions between all metastable system states of interest. Herein, these starting trajectories have been obtained with a ratchet-and-pawl restraint Camilloni, Broglia, and Tiana (2011), which provided reactive data of sufficient quality to successfully train DeepLNE models for all the studied systems. The input trajectory is used to define a feature vector to distinguish all states encountered and their intermediates. With this featurized trajectory, one can automatically derive a path-like DeepLNE teacher model. This model has advantageous extrapolation capabilities because it is intrinsically anchored to the training data. To maximally maintain this behavior when approximating the teacher model using ANNs as DeepLNE student models, we rely on an explorative enhanced sampling step, where we use the DeepLNE teacher model to bias the system dynamics in order to expand the training dataset. The interpretation of the derived variables s𝑠sitalic_s and z𝑧zitalic_z is identical in DeepLNE teacher and student models, as they describe the progress along the path and its perpendicular distance, respectively.

The proposed method achieves a speed-up in computational model evaluation when biasing MD simulations with DeepLNE. It does so by representing the DeepLNE model containing a large number of fitting parameters with ANNs of lower complexity. The DeepLNE++ approach uses cross-validation to choose the models with the highest robustness when extrapolating into unknown regions of the descriptor spaces. Furthermore, we represented the DeepLNE variables s𝑠sitalic_s and z𝑧zitalic_z by an ANN each, instead of using a single ANN to reconstruct both variables in the output. This decision is made because one would need to normalize the outputs and also ensure for each system that the objective function is leading to equally distributing loss minimization for each variable simultaneously.

We keep some hyperparameter choices to the user as regularization options that permit tailoring a path-like CV adapt to system-specific needs. The model training steps of the algorithm are computationally fast and can additionally employ available GPUs and therefore allow fine-tuning all hyperparameter choices of DeepLNE leading to CVs of high quality. We demonstrated how the hyperparameters within the DeepLNE knowledge distillation framework allow for regulating the models. For all the DeepLNE models we tested k=3𝑘3k=3italic_k = 3 and t=0.1𝑡0.1t=0.1italic_t = 0.1 in order to establish a continuous k-nearest-neighbor selection step that allows to obtain DeepLNE teacher models with high locality and numerical stability. The utility of the hyperparameter α𝛼\alphaitalic_α was shown for alanine tripeptide where the relevant dihedral angle input descriptors could be selected to describe the desired transition path. Although not implemented in this study, the selection of α𝛼\alphaitalic_α for each feature can be automated by adding an additional L1-norm or Softmax term for α𝛼\alphaitalic_α to the objective function. This would allow for automatic assignment of feature importance weights. In all 3 toy models, it was necessary to assign a significantly large value to the hyperparameter β𝛽\betaitalic_β in order to supervise DeepLNE with respect to the directionality and sequentiality of the variables s𝑠sitalic_s and z𝑧zitalic_z. The hyperparameter γ𝛾\gammaitalic_γ that is penalizing the magnitude of the fitting parameters in teacher as well as student models is chosen relatively low in all 3 toy models. This resulted in models of higher functional flexibility. However, because we were choosing the models that minimize the cross-validation error in all cases the models exhibited sufficiently smooth extrapolation behavior and numerical stability when applied to the task of CV-based free energy estimation using enhanced sampling MD.

We notice that our proposed knowledge distillation framework is not limited to the representation of DeepLNE CVs as computationally fast ANNs, but can be used to approximate any computationally expensive CV.

We show that our method can be straightforwardly applied to multi-state systems of biologically relevant complexity at a fraction of the computational cost of the original DeepLNE CV Fröhlking et al. (2024). A DeepLNE student model can be successfully trained as long as the training trajectories visit the entire descriptor space encompassed by the reaction channel defined by the variables s𝑠sitalic_s and z𝑧zitalic_z. In the systems studied here, we impose a sequentiality of the metastable states into the DeepLNE s𝑠sitalic_s variable in order to show that path-like CVs for multi-state systems benefit from a supervised multi-tasking objective function during the training step. In summary, we believe that the advances presented in this study significantly broaden the applicability of DeepLNE++, especially for large and complex biological systems. The introduction of automatic, efficient, and fast path-like CVs in DeepLNE++ is crucial for accelerating MD sampling of their dynamics and reactions.

Supplementary Material

Sections for each toy model showing (1) Müller-Brown-Potential with additional details of the DeepLNE training steps and resulting 2D FES of the biased MD simulations, (2) Alanine Tripeptide details of the DeepLNE training steps and resulting 2D FES of the biased MD simulations, (3) Pre-miR21 training datapoints for the DeepLNE models and structural analysis of the identified metastable states from the enhanced sampling MD simulations.

Acknowledgments

We acknowledge Luigi Bonati for useful discussion and providing many suggestions. F.L.G., V.R., S.A. and T.F. acknowledge the Swiss National Supercomputing Centre (CSCS) for large supercomputer time allocations projectID:s1228. They also acknowledge the Swiss National Science Foundation and Bridge for financial support (projects numbers: 200021_204795200021_204795200021\_204795200021 _ 204795, CRSII5_216587𝐶𝑅𝑆𝐼𝐼5_216587CRSII5\_216587italic_C italic_R italic_S italic_I italic_I 5 _ 216587 and 40B20_20362840𝐵20_20362840B2-0\_20362840 italic_B 2 - 0 _ 203628).

Data availability

At https://github.com/ThorbenF/DeepLNE2 we provided tutorials together with scripts that allow to reproduce the figures and the results of this study.

References

  • Dror et al. (2012) R. O. Dror, R. M. Dirks, J. Grossman, H. Xu,  and D. E. Shaw, “Biomolecular simulation: A computational microscope for molecular biology,” Annual Review of Biophysics 41, 429–452 (2012).
  • Laio and Gervasio (2008) A. Laio and F. L. Gervasio, “Metadynamics: a method to simulate rare events and reconstruct the free energy in biophysics, chemistry and material science,” Reports on Progress in Physics 71, 126601 (2008).
  • Valsson, Tiwary, and Parrinello (2016) O. Valsson, P. Tiwary,  and M. Parrinello, “Enhancing Important Fluctuations: Rare Events and Metadynamics from a Conceptual Viewpoint,” Annual Review of Physical Chemistry 67, 159–184 (2016).
  • Camilloni and Pietrucci (2018) C. Camilloni and F. Pietrucci, “Advanced simulation techniques for the thermodynamic and kinetic characterization of biological systems,” Advances in Physics: X 3, 1477531 (2018).
  • Laio and Parrinello (2002) A. Laio and M. Parrinello, “Escaping free-energy minima,” Proceedings of the National Academy of Sciences 99, 12562–12566 (2002).
  • Bolhuis et al. (2002) P. G. Bolhuis, D. Chandler, C. Dellago,  and P. L. Geissler, “Transition path sampling: Throwing ropes over rough mountain passes, in the dark,” Annual Review of Physical Chemistry 53, 291–318 (2002).
  • Pietrucci (2017) F. Pietrucci, “Strategies for the exploration of free energy landscapes: Unity in diversity and challenges ahead,” Reviews in Physics 2, 32–45 (2017).
  • Branduardi, Gervasio, and Parrinello (2007) D. Branduardi, F. L. Gervasio,  and M. Parrinello, “From A to B in free energy space,” The Journal of Chemical Physics 126, 054103 (2007).
  • Hovan, Comitani, and Gervasio (2019) L. Hovan, F. Comitani,  and F. L. Gervasio, “Defining an optimal metric for the path collective variables,” Journal of Chemical Theory and Computation 15, 25–32 (2019).
  • Bonati, Rizzi, and Parrinello (2020) L. Bonati, V. Rizzi,  and M. Parrinello, “Data-driven collective variables for enhanced sampling,” The Journal of Physical Chemistry Letters 11, 2998–3004 (2020).
  • Bonati et al. (2023) L. Bonati, E. Trizio, A. Rizzi,  and M. Parrinello, “A unified framework for machine learning collective variables for enhanced sampling simulations: mlcolvar,” The Journal of Chemical Physics 159, 014801 (2023).
  • Šípka, Erlebach, and Grajciar (2023) M. Šípka, A. Erlebach,  and L. Grajciar, “Constructing collective variables using invariant learned representations,” Journal of Chemical Theory and Computation 19, 887–901 (2023).
  • France-Lanord et al. (2024) A. France-Lanord, H. Vroylandt, M. Salanne, B. Rotenberg, A. M. Saitta,  and F. Pietrucci, “Data-driven path collective variables,” Journal of Chemical Theory and Computation 20, 3069–3084 (2024).
  • Fröhlking et al. (2024) T. Fröhlking, L. Bonati, V. Rizzi,  and F. L. Gervasio, “Deep learning path-like collective variable for enhanced sampling molecular dynamics,” The Journal of Chemical Physics 160, 174109 (2024).
  • Roweis and Saul (2000) S. T. Roweis and L. K. Saul, “Nonlinear dimensionality reduction by locally linear embedding,” Science 290, 2323–2326 (2000).
  • Paszke et al. (2017) A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga,  and A. Lerer, “Automatic differentiation in pytorc,” NIPS 2017 Workshop on Autodiff  (2017).
  • Invernizzi and Parrinello (2020) M. Invernizzi and M. Parrinello, “Rethinking metadynamics: From bias potentials to probability distributions,” The Journal of Physical Chemistry Letters 11, 2731–2736 (2020).
  • Invernizzi and Parrinello (2022) M. Invernizzi and M. Parrinello, “Exploration vs convergence speed in adaptive-bias enhanced sampling,” Journal of Chemical Theory and Computation 18, 3988–3996 (2022).
  • Rizzi et al. (2023) V. Rizzi, S. Aureli, N. Ansari,  and F. L. Gervasio, “Oneopes, a combined enhanced sampling method to rule them all,” Journal of Chemical Theory and Computation 19, 5731–5742 (2023), pMID: 37603295.
  • Camilloni, Broglia, and Tiana (2011) C. Camilloni, R. A. Broglia,  and G. Tiana, “Hierarchy of folding and unfolding events of protein G, CI2, and ACBP from explicit-solvent simulations,” The Journal of Chemical Physics 134, 045105 (2011).
  • Goscinski et al. (2023) A. Goscinski, V. Principe, G. Fraux, S. Kliavinek, B. Helfrecht, P. Loche, M. Ceriotti,  and R. Cersonsky, “scikit-matter : A suite of generalisable machine learning methods born out of chemistry and materials science [version 2; peer review: 2 approved, 1 approved with reservations],” Open Research Europe 3 (2023).
  • Tribello et al. (2014) G. A. Tribello, M. Bonomi, D. Branduardi, C. Camilloni,  and G. Bussi, “Plumed 2: New feathers for an old bird,” Computer Physics Communications 185, 604–613 (2014).
  • Aureli et al. (2024) S. Aureli, F. Bellina, V. Rizzi,  and F. L. Gervasio, “Investigating ligand-mediated conformational dynamics of pre-mir21: A machine-learning-aided enhanced sampling study,” bioRxiv  (2024).
  • Valsson and Parrinello (2014) O. Valsson and M. Parrinello, “Variational approach to enhanced sampling and free energy calculations,” Physical Review Letters 113, 1–5 (2014).
  • Piana et al. (2020) S. Piana, P. Robustelli, D. Tan, S. Chen,  and D. E. Shaw, “Development of a force field for the simulation of single-chain proteins and protein–protein complexes,” Journal of Chemical Theory and Computation 16, 2494–2507 (2020).
  • Abraham et al. (2015) M. J. Abraham, T. Murtola, R. Schulz, S. Páll, J. C. Smith, B. Hess,  and E. Lindahl, “GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers,” SoftwareX 1, 19–25 (2015).
  • Bussi (2014) G. Bussi, “Hamiltonian replica exchange in gromacs: a flexible implementation,” Molecular Physics 112, 379–384 (2014).

Supplementary Material - DeepLNE++ leveraging knowledge distillation for accelerated multi-state path-like collective variables

Thorben Fröhlking1,2,3, Valerio Rizzi1,2,3, Simone Aureli1,2,3, Francesco Luigi Gervasio1,2,3,4

1School of Pharmaceutical Sciences, University of Geneva, Rue Michel Servet 1, 1206, Genève, Switzerland

2Institute of Pharmaceutical Sciences of Western Switzerland (ISPSO), University of Geneva, 1206, Genève, Switzerland

3Swiss Institute of Bioinformatics, University of Geneva, 1206, Genève, Switzerland

4Department of Chemistry, University College London, London, WC1E 6BT, United Kingdom

francesco.gervasio@unige.ch

SI 1. Particle in 3-state Müller-Brown-Potential

Refer to caption
Figure 1: Results of training DeepLNE for a particle simulation in the 3-state Müller-Brown-potential. (a) Training datapoints for DeepLNE, obtained by concatenating 2 ABMD (biasing y-coordinate with a spring constant of κ=30,80𝜅3080\kappa=30,80italic_κ = 30 , 80 kBTsubscript𝑘𝐵𝑇k_{B}Titalic_k start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT italic_T respectively. The datapoints are colored based on the label for the 3 states (0, 0.5, 1). (b) Training datapoints superimposed with the 2D coordinate space is colored based on the trained DeepLNE z𝑧zitalic_z variable with a λ=10𝜆10\lambda=10italic_λ = 10. Also we superimpose the 𝑿^^𝑿\hat{\bm{X}}over^ start_ARG bold_italic_X end_ARG of the DeepLNE model. (c) Sampled configuration during the biased simulation using OPES-EXPLORE applied on the trained PyTorch DeepLNE CVs and a harmonic constraint applied on z𝑧zitalic_z. The color of the datapoints correspond to the DeepLNE z𝑧zitalic_z variable showing path-like behavior. These datapoints are subsequently used to train the DeepLNE student models. (d) Two-dimensional free energy landscape with respect to the DeepLNE s𝑠sitalic_s and z𝑧zitalic_z variable obtained from biased MD using OPES on the student model s𝑠sitalic_s and harmonic restraint on student z𝑧zitalic_z. This is especially useful when one is interested in identifying the transition states in more detail.

SI 2. Alanine Tripeptide

Refer to caption
Figure 2: Results of training DeepLNE for a Alanine Tripeptide simulation. (a) Training datapoints for DeepLNE, obtained by concatenating ABMD (biasing ϕ1,ϕ2,ϕ3italic-ϕ1italic-ϕ2italic-ϕ3\phi 1,\phi 2,\phi 3italic_ϕ 1 , italic_ϕ 2 , italic_ϕ 3 with a spring constant of κ=100𝜅100\kappa=100italic_κ = 100 kJ/mol respectively. The datapoints are colored based on the label for the 3 states (0, 0.5, 1). (b) Training datapoints for DeepLNE superimposed with the 2D coordinate space colored based on the trained DeepLNE z𝑧zitalic_z variable with a λ=10𝜆10\lambda=10italic_λ = 10 (we set all remaining ϕitalic-ϕ\phiitalic_ϕ and ψ𝜓\psiitalic_ψ input variables to 00). Also we superimpose the 𝑿^^𝑿\hat{\bm{X}}over^ start_ARG bold_italic_X end_ARG of the DeepLNE model. (c) Sampled configuration during the biased simulation using OPES-EXPLORE applied on the trained PyTorch DeepLNE CVs and a harmonic constraint applied on z𝑧zitalic_z. The color of the datapoints correspond to the DeepLNE z𝑧zitalic_z variable showing path-like behavior. These datapoints are subsequently used to train the DeepLNE student models. (d) Two-dimensional free energy landscape with respect to the DeepLNE s𝑠sitalic_s and z𝑧zitalic_z variable obtained from biased MD using OPES on the student model s and harmonic restraint on student z𝑧zitalic_z. One can identify sharply the transition regions between the 3 metastable states near s=0.2,z=0.04formulae-sequence𝑠0.2𝑧0.04s=0.2,z=0.04italic_s = 0.2 , italic_z = 0.04 and s=0.7,z=0.04formulae-sequence𝑠0.7𝑧0.04s=0.7,z=0.04italic_s = 0.7 , italic_z = 0.04.

SI 3. Pre-miR-21

Refer to caption
Figure 3: Results of training DeepLNE for a RNA pre-miR21 simulation. (a) Training datapoints for DeepLNE, obtained by concatenating 5 ABMD (biasing the 3 CMAP CVs with a spring constant of κ=100𝜅100\kappa=100italic_κ = 100 kJ/mol respectively. The datapoints are colored based on the label for the 3 states (0, 0.5, 1). (b) Sampled configuration during the biased simulation using OPES-EXPLORE applied on the trained PyTorch DeepLNE CV s𝑠sitalic_s. The color of the datapoints correspond to the DeepLNE s𝑠sitalic_s variable showing path-like behavior. These datapoints are subsequently used to train the DeepLNE student models. (c) Sampled configuration during the biased simulation using OPES-EXPLORE applied on the trained PyTorch DeepLNE CVs and a harmonic constraint applied on z𝑧zitalic_z. The color of the datapoints correspond to the DeepLNE z𝑧zitalic_z variable with a λ=1𝜆1\lambda=1italic_λ = 1 showing path-like behavior. These datapoints are subsequently used to train the DeepLNE student models.
Refer to caption
Figure 4: Results of structural analysis of the FES for the DeepLNE biased RNA pre-miR21 simulation. The structure of pre-miR21 is composed of 31 nucleotides. The RNA from G22 to C54 is coloured with respect to the type of nucleotide: Guanine-purple, Adenine-light crimson, Cytosine-orange, Uracil-pink, G5-white, C3-mauve. We show the centroids of the 3 minima in the FES shown in the main text. The nucleobase A29 is highlighted via "VDW’ and the surrounding nucleobases are represented via ’licorice’ style. (a) State A at s=0,z=2.5formulae-sequence𝑠0𝑧2.5s=0,z=-2.5italic_s = 0 , italic_z = - 2.5, corresponding to the A29 being ’stacked-in’. The nucleobase is integrated within the helical structure forming hydrogen bonds with the surrounding nucleobases. (b) State B at s=0.4,z=0formulae-sequence𝑠0.4𝑧0s=0.4,z=0italic_s = 0.4 , italic_z = 0, corresponding to the A29 being ’partially rotated outward’. (c) State C at s=1.0,z=2formulae-sequence𝑠1.0𝑧2s=1.0,z=2italic_s = 1.0 , italic_z = 2, corresponding to the A29 being ’bulged-out’.