-
Anatomically aware simulation of patient-specific glioblastoma xenografts
Authors:
Adam A. Malik,
Cecilia Krona,
Soumi Kundu,
Philip Gerlee,
Sven Nelander
Abstract:
Patient-derived cells (PDC) mouse xenografts are increasingly important tools in glioblastoma (GBM) research, essential to investigate case-specific growth patterns and treatment responses. Despite the central role of xenograft models in the field, few good simulation models are available to probe the dynamics of tumor growth and to support therapy design. We therefore propose a new framework for…
▽ More
Patient-derived cells (PDC) mouse xenografts are increasingly important tools in glioblastoma (GBM) research, essential to investigate case-specific growth patterns and treatment responses. Despite the central role of xenograft models in the field, few good simulation models are available to probe the dynamics of tumor growth and to support therapy design. We therefore propose a new framework for the patient-specific simulation of GBM in the mouse brain. Unlike existing methods, our simulations leverage a high-resolution map of the mouse brain anatomy to yield patient-specific results that are in good agreement with experimental observations. To facilitate the fitting of our model to histological data, we use Approximate Bayesian Computation. Because our model uses few parameters, reflecting growth, invasion and niche dependencies, it is well suited for case comparisons and for probing treatment effects. We demonstrate how our model can be used to simulate different treatment by perturbing the different model parameters. We expect in silico replicates of mouse xenograft tumors can improve the assessment of therapeutic outcomes and boost the statistical power of preclinical GBM studies.
△ Less
Submitted 14 March, 2024;
originally announced March 2024.
-
LinearSankoff: Linear-time Simultaneous Folding and Alignment of RNA Homologs
Authors:
Sizhen Li,
Ning Dai,
He Zhang,
Apoorv Malik,
David H. Mathews,
Liang Huang
Abstract:
The classical Sankoff algorithm for the simultaneous folding and alignment of homologous RNA sequences is highly influential, but it suffers from two major limitations in efficiency and modeling power. First, it takes $O(n^6)$ for two sequences where n is the average sequence length. Most implementations and variations reduce the runtime to $O(n^3)$ by restricting the alignment search space, but t…
▽ More
The classical Sankoff algorithm for the simultaneous folding and alignment of homologous RNA sequences is highly influential, but it suffers from two major limitations in efficiency and modeling power. First, it takes $O(n^6)$ for two sequences where n is the average sequence length. Most implementations and variations reduce the runtime to $O(n^3)$ by restricting the alignment search space, but this is still too slow for long sequences such as full-length viral genomes. On the other hand, the Sankoff algorithm and all its existing implementations use a rather simplistic alignment model, which can result in poor alignment accuracy. To address these problems, we propose LinearSankoff, which seamlessly integrates the original Sankoff algorithm with a powerful Hidden Markov Model-based alignment module. This extension substantially improves alignment quality, which in turn benefits secondary structure prediction quality, confirmed over a diverse set of RNA families. LinearSankoff also applies beam search heuristics and the A$^\star$-like algorithm to achieve that runtime scales linearly with sequence length. LinearSankoff is the first linear-time algorithm for simultaneous folding and alignment, and the first such algorithm to scale to coronavirus genomes (n $\approx$ 30,000nt). It only takes 10 minutes for a pair of SARS-CoV-2 and SARS-related genomes, and outperforms previous work at identifying crucial conserved structures between the two genomes.
△ Less
Submitted 18 July, 2023;
originally announced July 2023.
-
LinearAlifold: Linear-Time Consensus Structure Prediction for RNA Alignments
Authors:
Apoorv Malik,
Liang Zhang,
Milan Gautam,
Ning Dai,
Sizhen Li,
He Zhang,
David H. Mathews,
Liang Huang
Abstract:
Predicting the consensus structure of a set of aligned RNA homologs is a convenient method to find conserved structures in an RNA genome, which has many applications including viral diagnostics and therapeutics. However, the most commonly used tool for this task, RNAalifold, is prohibitively slow for long sequences, due to a cubic scaling with the sequence length, taking over a day on 400 SARS-CoV…
▽ More
Predicting the consensus structure of a set of aligned RNA homologs is a convenient method to find conserved structures in an RNA genome, which has many applications including viral diagnostics and therapeutics. However, the most commonly used tool for this task, RNAalifold, is prohibitively slow for long sequences, due to a cubic scaling with the sequence length, taking over a day on 400 SARS-CoV-2 and SARS-related genomes (~30,000nt). We present LinearAlifold, a much faster alternative that scales linearly with both the sequence length and the number of sequences, based on our work LinearFold that folds a single RNA in linear time. Our work is orders of magnitude faster than RNAalifold (0.7 hours on the above 400 genomes, or ~36$\times$ speedup) and achieves higher accuracies when compared to a database of known structures. More interestingly, LinearAlifold's prediction on SARS-CoV-2 correlates well with experimentally determined structures, substantially outperforming RNAalifold. Finally, LinearAlifold supports two energy models (Vienna and BL*) and four modes: minimum free energy (MFE), maximum expected accuracy (MEA), ThreshKnot, and stochastic sampling, each of which takes under an hour for hundreds of SARS-CoV variants. Our resource is at: https://github.com/LinearFold/LinearAlifold (code) and http://linearfold.org/linear-alifold (server).
△ Less
Submitted 5 July, 2024; v1 submitted 29 June, 2022;
originally announced June 2022.
-
rfPhen2Gen: A machine learning based association study of brain imaging phenotypes to genotypes
Authors:
Muhammad Ammar Malik,
Alexander S. Lundervold,
Tom Michoel
Abstract:
Imaging genetic studies aim to find associations between genetic variants and imaging quantitative traits. Traditional genome-wide association studies (GWAS) are based on univariate statistical tests, but when multiple traits are analyzed together they suffer from a multiple-testing problem and from not taking into account correlations among the traits. An alternative approach to multi-trait GWAS…
▽ More
Imaging genetic studies aim to find associations between genetic variants and imaging quantitative traits. Traditional genome-wide association studies (GWAS) are based on univariate statistical tests, but when multiple traits are analyzed together they suffer from a multiple-testing problem and from not taking into account correlations among the traits. An alternative approach to multi-trait GWAS is to reverse the functional relation between genotypes and traits, by fitting a multivariate regression model to predict genotypes from multiple traits simultaneously. However, current reverse genotype prediction approaches are mostly based on linear models. Here, we evaluated random forest regression (RFR) as a method to predict SNPs from imaging QTs and identify biologically relevant associations. We learned machine learning models to predict 518,484 SNPs using 56 brain imaging QTs. We observed that genotype regression error is a better indicator of permutation p-value significance than genotype classification accuracy. SNPs within the known Alzheimer disease (AD) risk gene APOE had lowest RMSE for lasso and random forest, but not ridge regression. Moreover, random forests identified additional SNPs that were not prioritized by the linear models but are known to be associated with brain-related disorders. Feature selection identified well-known brain regions associated with AD,like the hippocampus and amygdala, as important predictors of the most significant SNPs. In summary, our results indicate that non-linear methods like random forests may offer additional insights into phenotype-genotype associations compared to traditional linear multi-variate GWAS methods.
△ Less
Submitted 31 March, 2022;
originally announced April 2022.
-
High-dimensional multi-trait GWAS by reverse prediction of genotypes
Authors:
Muhammad Ammar Malik,
Adriaan-Alexander Ludl,
Tom Michoel
Abstract:
Multi-trait genome-wide association studies (GWAS) use multi-variate statistical methods to identify associations between genetic variants and multiple correlated traits simultaneously, and have higher statistical power than independent univariate analyses of traits. Reverse regression, where genotypes of genetic variants are regressed on multiple traits simultaneously, has emerged as a promising…
▽ More
Multi-trait genome-wide association studies (GWAS) use multi-variate statistical methods to identify associations between genetic variants and multiple correlated traits simultaneously, and have higher statistical power than independent univariate analyses of traits. Reverse regression, where genotypes of genetic variants are regressed on multiple traits simultaneously, has emerged as a promising approach to perform multi-trait GWAS in high-dimensional settings where the number of traits exceeds the number of samples. We analyzed different machine learning methods (ridge regression, naive Bayes/independent univariate, random forests and support vector machines) for reverse regression in multi-trait GWAS, using genotypes, gene expression data and ground-truth transcriptional regulatory networks from the DREAM5 SysGen Challenge and from a cross between two yeast strains to evaluate methods. We found that genotype prediction performance, in terms of root mean squared error (RMSE), allowed to distinguish between genomic regions with high and low transcriptional activity. Moreover, model feature coefficients correlated with the strength of association between variants and individual traits, and were predictive of true trans-eQTL target genes, with complementary findings across methods. Code to reproduce the analysis is available at https://github.com/michoel-lab/Reverse-Pred-GWAS
△ Less
Submitted 9 February, 2022; v1 submitted 29 October, 2021;
originally announced November 2021.
-
Impulse data models for the inverse problem of electrocardiography
Authors:
Tommy Peng,
Avinash Malik,
Laura R. Bear,
Mark L. Trew
Abstract:
The proposed method re-frames traditional inverse problems of electrocardiography into regression problems, constraining the solution space by decomposing signals with multidimensional Gaussian impulse basis functions. Impulse HSPs were generated with single Gaussian basis functions at discrete heart surface locations and projected to corresponding BSPs using a volume conductor torso model. Both B…
▽ More
The proposed method re-frames traditional inverse problems of electrocardiography into regression problems, constraining the solution space by decomposing signals with multidimensional Gaussian impulse basis functions. Impulse HSPs were generated with single Gaussian basis functions at discrete heart surface locations and projected to corresponding BSPs using a volume conductor torso model. Both BSP (inputs) and HSP (outputs) were mapped to regular 2D surface meshes and used to train a neural network. Predictive capabilities of the network were tested with unseen synthetic and experimental data. A dense full connected single hidden layer neural network was trained to map body surface impulses to heart surface Gaussian basis functions for reconstructing HSP. Synthetic pulses moving across the heart surface were predicted from the neural network with root mean squared error of $9.1\pm1.4$%. Predicted signals were robust to noise up to 20 dB and errors due to displacement and rotation of the heart within the torso were bounded and predictable. A shift of the heart 40 mm toward the spine resulted in a 4\% increase in signal feature localization error. The set of training impulse function data could be reduced and prediction error remained bounded. Recorded HSPs from in-vitro pig hearts were reliably decomposed using space-time Gaussian basis functions. Predicted HSPs for left-ventricular pacing had a mean absolute error of $10.4\pm11.4$ ms. Other pacing scenarios were analyzed with similar success. Conclusion: Impulses from Gaussian basis functions are potentially an effective and robust way to train simple neural network data models for reconstructing HSPs from decomposed BSPs. The HSPs predicted by the neural network can be used to generate activation maps that non-invasively identify features of cardiac electrical dysfunction and can guide subsequent treatment options.
△ Less
Submitted 19 August, 2021; v1 submitted 31 January, 2021;
originally announced February 2021.
-
Restricted maximum-likelihood method for learning latent variance components in gene expression data with known and unknown confounders
Authors:
Muhammad Ammar Malik,
Tom Michoel
Abstract:
Random effect models are popular statistical models for detecting and correcting spurious sample correlations due to hidden confounders in genome-wide gene expression data. In applications where some confounding factors are known, estimating simultaneously the contribution of known and latent variance components in random effect models is a challenge that has so far relied on numerical gradient-ba…
▽ More
Random effect models are popular statistical models for detecting and correcting spurious sample correlations due to hidden confounders in genome-wide gene expression data. In applications where some confounding factors are known, estimating simultaneously the contribution of known and latent variance components in random effect models is a challenge that has so far relied on numerical gradient-based optimizers to maximize the likelihood function. This is unsatisfactory because the resulting solution is poorly characterized and the efficiency of the method may be suboptimal. Here we prove analytically that maximum-likelihood latent variables can always be chosen orthogonal to the known confounding factors, in other words, that maximum-likelihood latent variables explain sample covariances not already explained by known factors. Based on this result we propose a restricted maximum-likelihood method which estimates the latent variables by maximizing the likelihood on the restricted subspace orthogonal to the known confounding factors, and show that this reduces to probabilistic PCA on that subspace. The method then estimates the variance-covariance parameters by maximizing the remaining terms in the likelihood function given the latent variables, using a newly derived analytic solution for this problem. Compared to gradient-based optimizers, our method attains greater or equal likelihood values, can be computed using standard matrix operations, results in latent factors that don't overlap with any known factors, and has a runtime reduced by several orders of magnitude. Hence the restricted maximum-likelihood method facilitates the application of random effect modelling strategies for learning latent variance components to much larger gene expression datasets than possible with current methods.
△ Less
Submitted 4 November, 2021; v1 submitted 6 May, 2020;
originally announced May 2020.
-
A Review on understanding Brain, and Memory Retention and Recall Processes using EEG and fMRI techniques
Authors:
Qazi Emad-Ul-Haq,
Muhammad Hussain,
Hatim Aboalsamh,
Saeed Bamatraf,
Aamir Saeed Malik,
Hafeez Ullah Amin
Abstract:
Human memory -- the learning of new information involves changes at the synaptic level between neurons dedicated for storage of in-formation. Generally, memory is classified as Long-Term Memory and Short-Term Memory. The various types of the memory and their disorder are widely studied using neuroimaging techniques like Electroencephalography (EEG) and functional Magnetic Resonance Imaging (fMRI).…
▽ More
Human memory -- the learning of new information involves changes at the synaptic level between neurons dedicated for storage of in-formation. Generally, memory is classified as Long-Term Memory and Short-Term Memory. The various types of the memory and their disorder are widely studied using neuroimaging techniques like Electroencephalography (EEG) and functional Magnetic Resonance Imaging (fMRI). Brain is effectively occupied with the capabilities of learning, retention and recall. The brain regions (pre-frontal cortex, associated hippocampus cortices and their interactions with other lobes) involved in memory recall tasks focuses on understanding the memory retention and recall processes. However, due to highly complicated and dynamic mechanisms of brain, the specific regions where information may reside are not completely explored. In this research paper, recent memory literature using EEG and fMRI studies is reviewed to understand the memory retention and recall processes as well as the various brain regions associated with these processes. A number of stimuli which are reported in previous studies are evaluated and discussed. Furthermore, the challenges which are being faced by researchers in EEG and fMRI methodologies are also presented. Recommendations for the future research related to memory retention and recall are also discussed at the end.
△ Less
Submitted 30 April, 2019;
originally announced May 2019.
-
An intracardiac electrogram model to bridge virtual hearts and implantable cardiac devices
Authors:
Weiwei Ai,
Nitish Patel,
Partha Roop,
Avinash Malik,
Nathan Allen,
Mark L. Trew
Abstract:
Virtual heart models have been proposed to enhance the safety of implantable cardiac devices through closed loop validation. To communicate with a virtual heart, devices have been driven by cardiac signals at specific sites. As a result, only the action potentials of these sites are sensed. However, the real device implanted in the heart will sense a complex combination of near and far-field extra…
▽ More
Virtual heart models have been proposed to enhance the safety of implantable cardiac devices through closed loop validation. To communicate with a virtual heart, devices have been driven by cardiac signals at specific sites. As a result, only the action potentials of these sites are sensed. However, the real device implanted in the heart will sense a complex combination of near and far-field extracellular potential signals. Therefore many device functions, such as blanking periods and refractory periods, are designed to handle these unexpected signals. To represent these signals, we develop an intracardiac electrogram (IEGM) model as an interface between the virtual heart and the device. The model can capture not only the local excitation but also far-field signals and pacing afterpotentials. Moreover, the sensing controller can specify unipolar or bipolar electrogram (EGM) sensing configurations and introduce various oversensing and undersensing modes. The simulation results show that the model is able to reproduce clinically observed sensing problems, which significantly extends the capabilities of the virtual heart model in the context of device validation.
△ Less
Submitted 3 March, 2017;
originally announced March 2017.
-
Towards the Emulation of the Cardiac Conduction System for Pacemaker Testing
Authors:
Eugene Yip,
Sidharta Andalam,
Partha S. Roop,
Avinash Malik,
Mark Trew,
Weiwei Ai,
Nitish Patel
Abstract:
The heart is a vital organ that relies on the orchestrated propagation of electrical stimuli to coordinate each heart beat. Abnormalities in the heart's electrical behaviour can be managed with a cardiac pacemaker. Recently, the closed-loop testing of pacemakers with an emulation (real-time simulation) of the heart has been proposed. An emulated heart would provide realistic reactions to the pacem…
▽ More
The heart is a vital organ that relies on the orchestrated propagation of electrical stimuli to coordinate each heart beat. Abnormalities in the heart's electrical behaviour can be managed with a cardiac pacemaker. Recently, the closed-loop testing of pacemakers with an emulation (real-time simulation) of the heart has been proposed. An emulated heart would provide realistic reactions to the pacemaker as if it were a real heart. This enables developers to interrogate their pacemaker design without having to engage in costly or lengthy clinical trials. Many high-fidelity heart models have been developed, but are too computationally intensive to be simulated in real-time. Heart models, designed specifically for the closed-loop testing of pacemakers, are too abstract to be useful in the testing of physical pacemakers.
In the context of pacemaker testing, this paper presents a more computationally efficient heart model that generates realistic continuous-time electrical signals. The heart model is composed of cardiac cells that are connected by paths. Significant improvements were made to an existing cardiac cell model to stabilise its activation behaviour and to an existing path model to capture the behaviour of continuous electrical propagation. We provide simulation results that show our ability to faithfully model complex re-entrant circuits (that cause arrhythmia) that existing heart models can not.
△ Less
Submitted 17 March, 2016; v1 submitted 16 March, 2016;
originally announced March 2016.