-
A Unified Intracellular pH Landscape with SITE-pHorin: a Quantum-Entanglement-Enhanced pH Probe
Authors:
Shu-Ang Li,
Xiao-Yan Meng,
Su Zhang,
Ying-Jie Zhang,
Run-Zhou Yang,
Dian-Dian Wang,
Yang Yang,
Pei-Pei Liu,
Jian-Sheng Kang
Abstract:
An accurate map of intracellular organelle pH is crucial for comprehending cellular metabolism and organellar functions. However, a unified intracellular pH spectrum using a single probe is still lack. Here, we developed a novel quantum entanglement-enhanced pH-sensitive probe called SITE-pHorin, which featured a wide pH-sensitive range and ratiometric quantitative measurement capabilities. Subseq…
▽ More
An accurate map of intracellular organelle pH is crucial for comprehending cellular metabolism and organellar functions. However, a unified intracellular pH spectrum using a single probe is still lack. Here, we developed a novel quantum entanglement-enhanced pH-sensitive probe called SITE-pHorin, which featured a wide pH-sensitive range and ratiometric quantitative measurement capabilities. Subsequently, we measured the pH of various organelles and their sub-compartments, including mitochondrial sub-spaces, Golgi stacks, endoplasmic reticulum, lysosomes, peroxisomes, and endosomes in COS-7 cells. For the long-standing debate on mitochondrial compartments pH, we measured the pH of mitochondrial cristae as 6.60 \pm 0.40, the pH of mitochondrial intermembrane space as 6.95 \pm 0.30, and two populations of mitochondrial matrix pH at approximately 7.20 \pm 0.27 and 7.50 \pm 0.16, respectively. Notably, the lysosome pH exhibited a single, narrow Gaussian distribution centered at 4.79 \pm 0.17. Furthermore, quantum chemistry computations revealed that both the deprotonation of the residue Y182 and the discrete curvature of deformed benzene ring in chromophore are both necessary for the quantum entanglement mechanism of SITE-pHorin. Intriguingly, our findings reveal an accurate pH gradient (0.6-0.9 pH unit) between mitochondrial cristae and matrix, suggesting prior knowledge about ΔpH (0.4-0.6) and mitochondrial proton motive force (pmf) are underestimated.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
Global Human-guided Counterfactual Explanations for Molecular Properties via Reinforcement Learning
Authors:
Danqing Wang,
Antonis Antoniades,
Kha-Dinh Luong,
Edwin Zhang,
Mert Kosan,
Jiachen Li,
Ambuj Singh,
William Yang Wang,
Lei Li
Abstract:
Counterfactual explanations of Graph Neural Networks (GNNs) offer a powerful way to understand data that can naturally be represented by a graph structure. Furthermore, in many domains, it is highly desirable to derive data-driven global explanations or rules that can better explain the high-level properties of the models and data in question. However, evaluating global counterfactual explanations…
▽ More
Counterfactual explanations of Graph Neural Networks (GNNs) offer a powerful way to understand data that can naturally be represented by a graph structure. Furthermore, in many domains, it is highly desirable to derive data-driven global explanations or rules that can better explain the high-level properties of the models and data in question. However, evaluating global counterfactual explanations is hard in real-world datasets due to a lack of human-annotated ground truth, which limits their use in areas like molecular sciences. Additionally, the increasing scale of these datasets provides a challenge for random search-based methods. In this paper, we develop a novel global explanation model RLHEX for molecular property prediction. It aligns the counterfactual explanations with human-defined principles, making the explanations more interpretable and easy for experts to evaluate. RLHEX includes a VAE-based graph generator to generate global explanations and an adapter to adjust the latent representation space to human-defined principles. Optimized by Proximal Policy Optimization (PPO), the global explanations produced by RLHEX cover 4.12% more input graphs and reduce the distance between the counterfactual explanation set and the input set by 0.47% on average across three molecular datasets. RLHEX provides a flexible framework to incorporate different human-designed principles into the counterfactual explanation generation process, aligning these explanations with domain expertise. The code and data are released at https://github.com/dqwang122/RLHEX.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Enhanced Gene Selection in Single-Cell Genomics: Pre-Filtering Synergy and Reinforced Optimization
Authors:
Weiliang Zhang,
Zhen Meng,
Dongjie Wang,
Min Wu,
Kunpeng Liu,
Yuanchun Zhou,
Meng Xiao
Abstract:
Recent advancements in single-cell genomics necessitate precision in gene panel selection to interpret complex biological data effectively. Those methods aim to streamline the analysis of scRNA-seq data by focusing on the most informative genes that contribute significantly to the specific analysis task. Traditional selection methods, which often rely on expert domain knowledge, embedded machine l…
▽ More
Recent advancements in single-cell genomics necessitate precision in gene panel selection to interpret complex biological data effectively. Those methods aim to streamline the analysis of scRNA-seq data by focusing on the most informative genes that contribute significantly to the specific analysis task. Traditional selection methods, which often rely on expert domain knowledge, embedded machine learning models, or heuristic-based iterative optimization, are prone to biases and inefficiencies that may obscure critical genomic signals. Recognizing the limitations of traditional methods, we aim to transcend these constraints with a refined strategy. In this study, we introduce an iterative gene panel selection strategy that is applicable to clustering tasks in single-cell genomics. Our method uniquely integrates results from other gene selection algorithms, providing valuable preliminary boundaries or prior knowledge as initial guides in the search space to enhance the efficiency of our framework. Furthermore, we incorporate the stochastic nature of the exploration process in reinforcement learning (RL) and its capability for continuous optimization through reward-based feedback. This combination mitigates the biases inherent in the initial boundaries and harnesses RL's adaptability to refine and target gene panel selection dynamically. To illustrate the effectiveness of our method, we conducted detailed comparative experiments, case studies, and visualization analysis.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
BMRetriever: Tuning Large Language Models as Better Biomedical Text Retrievers
Authors:
Ran Xu,
Wenqi Shi,
Yue Yu,
Yuchen Zhuang,
Yanqiao Zhu,
May D. Wang,
Joyce C. Ho,
Chao Zhang,
Carl Yang
Abstract:
Developing effective biomedical retrieval models is important for excelling at knowledge-intensive biomedical tasks but still challenging due to the deficiency of sufficient publicly annotated biomedical data and computational resources. We present BMRetriever, a series of dense retrievers for enhancing biomedical retrieval via unsupervised pre-training on large biomedical corpora, followed by ins…
▽ More
Developing effective biomedical retrieval models is important for excelling at knowledge-intensive biomedical tasks but still challenging due to the deficiency of sufficient publicly annotated biomedical data and computational resources. We present BMRetriever, a series of dense retrievers for enhancing biomedical retrieval via unsupervised pre-training on large biomedical corpora, followed by instruction fine-tuning on a combination of labeled datasets and synthetic pairs. Experiments on 5 biomedical tasks across 11 datasets verify BMRetriever's efficacy on various biomedical applications. BMRetriever also exhibits strong parameter efficiency, with the 410M variant outperforming baselines up to 11.7 times larger, and the 2B variant matching the performance of models with over 5B parameters. The training data and model checkpoints are released at \url{https://huggingface.co/BMRetriever} to ensure transparency, reproducibility, and application to new domains.
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
-
BrainKnow -- Extracting, Linking, and Synthesizing Neuroscience Knowledge
Authors:
Cunqing Huangfu,
Kang Sun,
Yi Zeng,
Yuwei Wang,
Dongsheng Wang,
Zizhe Ruan
Abstract:
The exponential growth of neuroscience literature presents a significant challenge for researchers seeking to efficiently access and utilize relevant information. To address this issue, we introduce the Brain Knowledge Engine (BrainKnow), an automated system designed to extract, link, and synthesize neuroscience knowledge from scientific publications. BrainKnow constructs a comprehensive knowledge…
▽ More
The exponential growth of neuroscience literature presents a significant challenge for researchers seeking to efficiently access and utilize relevant information. To address this issue, we introduce the Brain Knowledge Engine (BrainKnow), an automated system designed to extract, link, and synthesize neuroscience knowledge from scientific publications. BrainKnow constructs a comprehensive knowledge graph encompassing 3,626,931 relationships across 37,011 neuroscience concepts, derived from 1,817,744 articles. This vast repository of knowledge is accessible through a user-friendly web interface, facilitating efficient navigation and data retrieval. BrainKnow employs advanced graph network algorithms, specifically Node2Vec, to enhance knowledge recommendation and visualization. This enables users to explore semantic relationships between concepts, predict potential new relationships, and gain a deeper understanding of the interconnectedness within neuroscience. Additionally, BrainKnow ensures real-time updates by synchronizing with PubMed, providing researchers with access to the most current information. BrainKnow serves as a valuable resource for neuroscience researchers, offering a powerful tool for exploring, synthesizing, and leveraging the vast and complex knowledge base of the field.
△ Less
Submitted 6 June, 2024; v1 submitted 7 March, 2024;
originally announced March 2024.
-
RAM-EHR: Retrieval Augmentation Meets Clinical Predictions on Electronic Health Records
Authors:
Ran Xu,
Wenqi Shi,
Yue Yu,
Yuchen Zhuang,
Bowen Jin,
May D. Wang,
Joyce C. Ho,
Carl Yang
Abstract:
We present RAM-EHR, a Retrieval AugMentation pipeline to improve clinical predictions on Electronic Health Records (EHRs). RAM-EHR first collects multiple knowledge sources, converts them into text format, and uses dense retrieval to obtain information related to medical concepts. This strategy addresses the difficulties associated with complex names for the concepts. RAM-EHR then augments the loc…
▽ More
We present RAM-EHR, a Retrieval AugMentation pipeline to improve clinical predictions on Electronic Health Records (EHRs). RAM-EHR first collects multiple knowledge sources, converts them into text format, and uses dense retrieval to obtain information related to medical concepts. This strategy addresses the difficulties associated with complex names for the concepts. RAM-EHR then augments the local EHR predictive model co-trained with consistency regularization to capture complementary information from patient visits and summarized knowledge. Experiments on two EHR datasets show the efficacy of RAM-EHR over previous knowledge-enhanced baselines (3.4% gain in AUROC and 7.2% gain in AUPR), emphasizing the effectiveness of the summarized knowledge from RAM-EHR for clinical prediction tasks. The code will be published at \url{https://github.com/ritaranx/RAM-EHR}.
△ Less
Submitted 4 June, 2024; v1 submitted 25 February, 2024;
originally announced March 2024.
-
Longitudinal prediction of DNA methylation to forecast epigenetic outcomes
Authors:
Arthur Leroy,
Ai Ling Teh,
Frank Dondelinger,
Mauricio A. Alvarez,
Dennis Wang
Abstract:
Interrogating the evolution of biological changes at early stages of life requires longitudinal profiling of molecules, such as DNA methylation, which can be challenging with children. We introduce a probabilistic and longitudinal machine learning framework based on multi-mean Gaussian processes (GPs), accounting for individual and gene correlations across time. This method provides future predict…
▽ More
Interrogating the evolution of biological changes at early stages of life requires longitudinal profiling of molecules, such as DNA methylation, which can be challenging with children. We introduce a probabilistic and longitudinal machine learning framework based on multi-mean Gaussian processes (GPs), accounting for individual and gene correlations across time. This method provides future predictions of DNA methylation status at different individual ages while accounting for uncertainty. Our model is trained on a birth cohort of children with methylation profiled at ages 0-4, and we demonstrated that the status of methylation sites for each child can be accurately predicted at ages 5-7. We show that methylation profiles predicted by multi-mean GPs can be used to estimate other phenotypes, such as epigenetic age, and enable comparison to other health measures of interest. This approach encourages epigenetic studies to move towards longitudinal design for investigating epigenetic changes during development, ageing and disease progression.
△ Less
Submitted 19 December, 2023;
originally announced December 2023.
-
Latent Space Inference For Spatial Transcriptomics
Authors:
J. Ding,
S. N. Zaman,
P. Y. Chen,
D. Wang
Abstract:
In order to understand the complexities of cellular biology, researchers are interested in two important metrics: the genetic expression information of cells and their spatial coordinates within a tissue sample. However, state-of-the art methods, namely single-cell RNA sequencing and image based spatial transcriptomics can only recover a subset of this information, either full genetic expression w…
▽ More
In order to understand the complexities of cellular biology, researchers are interested in two important metrics: the genetic expression information of cells and their spatial coordinates within a tissue sample. However, state-of-the art methods, namely single-cell RNA sequencing and image based spatial transcriptomics can only recover a subset of this information, either full genetic expression with loss of spatial information, or spatial information with loss of resolution in sequencing data. In this project, we investigate a probabilistic machine learning method to obtain the full genetic expression information for tissues samples while also preserving their spatial coordinates. This is done through mapping both datasets to a joint latent space representation with the use of variational machine learning methods. From here, the full genetic and spatial information can be decoded and to give us greater insights on the understanding of cellular processes and pathways.
△ Less
Submitted 1 November, 2023;
originally announced November 2023.
-
Tracking dynamic flow: Decoding flow fluctuations through performance in a fine motor control task
Authors:
Bohao Tian,
Shijun Zhang,
Sirui Chen,
Yuru Zhang,
Kaiping Peng,
Hongxing Zhang,
Dangxiao Wang
Abstract:
Flow, an optimal mental state merging action and awareness, significantly impacts our emotion, performance, and well-being. However, capturing its swift fluctuations on a fine timescale is challenging due to the sparsity of the existing flow detecting tools. Here we present a fine fingertip force control (F3C) task to induce flow, wherein the task challenge is set at a compatible level with person…
▽ More
Flow, an optimal mental state merging action and awareness, significantly impacts our emotion, performance, and well-being. However, capturing its swift fluctuations on a fine timescale is challenging due to the sparsity of the existing flow detecting tools. Here we present a fine fingertip force control (F3C) task to induce flow, wherein the task challenge is set at a compatible level with personal skill, and to quantitatively track the flow state variations from synchronous motor control performance. We extract eight performance metrics from fingertip force sequence and reveal their significant differences under distinct flow states. Further, we built a learning-based flow decoder that aims to predict the continuous flow intensity during the user experiment through the selected performance metrics, taking the self-reported flow as the label. Cross-validation shows that the predicted flow intensity reaches significant correlation with the self-reported flow intensity (r=0.81). Based on the decoding results, we observe rapid oscillations in flow fluctuations during the intervals between sparse self-reporting probes. This study showcases the feasibility of tracking intrinsic flow variations with high temporal resolution using task performance measures and may serve as foundation for future work aiming to take advantage of flow' s dynamics to enhance performance and positive emotions.
△ Less
Submitted 28 December, 2023; v1 submitted 18 October, 2023;
originally announced October 2023.
-
SUGAR: Spherical Ultrafast Graph Attention Framework for Cortical Surface Registration
Authors:
Jianxun Ren,
Ning An,
Youjia Zhang,
Danyang Wang,
Zhenyu Sun,
Cong Lin,
Weigang Cui,
Weiwei Wang,
Ying Zhou,
Wei Zhang,
Qingyu Hu,
Ping Zhang,
Dan Hu,
Danhong Wang,
Hesheng Liu
Abstract:
Cortical surface registration plays a crucial role in aligning cortical functional and anatomical features across individuals. However, conventional registration algorithms are computationally inefficient. Recently, learning-based registration algorithms have emerged as a promising solution, significantly improving processing efficiency. Nonetheless, there remains a gap in the development of a lea…
▽ More
Cortical surface registration plays a crucial role in aligning cortical functional and anatomical features across individuals. However, conventional registration algorithms are computationally inefficient. Recently, learning-based registration algorithms have emerged as a promising solution, significantly improving processing efficiency. Nonetheless, there remains a gap in the development of a learning-based method that exceeds the state-of-the-art conventional methods simultaneously in computational efficiency, registration accuracy, and distortion control, despite the theoretically greater representational capabilities of deep learning approaches. To address the challenge, we present SUGAR, a unified unsupervised deep-learning framework for both rigid and non-rigid registration. SUGAR incorporates a U-Net-based spherical graph attention network and leverages the Euler angle representation for deformation. In addition to the similarity loss, we introduce fold and multiple distortion losses, to preserve topology and minimize various types of distortions. Furthermore, we propose a data augmentation strategy specifically tailored for spherical surface registration, enhancing the registration performance. Through extensive evaluation involving over 10,000 scans from 7 diverse datasets, we showed that our framework exhibits comparable or superior registration performance in accuracy, distortion, and test-retest reliability compared to conventional and learning-based methods. Additionally, SUGAR achieves remarkable sub-second processing times, offering a notable speed-up of approximately 12,000 times in registering 9,000 subjects from the UK Biobank dataset in just 32 minutes. This combination of high registration performance and accelerated processing time may greatly benefit large-scale neuroimaging studies.
△ Less
Submitted 2 July, 2023;
originally announced July 2023.
-
Working Memory Capacity of ChatGPT: An Empirical Study
Authors:
Dongyu Gong,
Xingchen Wan,
Dingmin Wang
Abstract:
Working memory is a critical aspect of both human intelligence and artificial intelligence, serving as a workspace for the temporary storage and manipulation of information. In this paper, we systematically assess the working memory capacity of ChatGPT, a large language model developed by OpenAI, by examining its performance in verbal and spatial n-back tasks under various conditions. Our experime…
▽ More
Working memory is a critical aspect of both human intelligence and artificial intelligence, serving as a workspace for the temporary storage and manipulation of information. In this paper, we systematically assess the working memory capacity of ChatGPT, a large language model developed by OpenAI, by examining its performance in verbal and spatial n-back tasks under various conditions. Our experiments reveal that ChatGPT has a working memory capacity limit strikingly similar to that of humans. Furthermore, we investigate the impact of different instruction strategies on ChatGPT's performance and observe that the fundamental patterns of a capacity limit persist. From our empirical findings, we propose that n-back tasks may serve as tools for benchmarking the working memory capacity of large language models and hold potential for informing future efforts aimed at enhancing AI working memory.
△ Less
Submitted 1 February, 2024; v1 submitted 30 April, 2023;
originally announced May 2023.
-
pyPESTO: A modular and scalable tool for parameter estimation for dynamic models
Authors:
Yannik Schälte,
Fabian Fröhlich,
Paul J. Jost,
Jakob Vanhoefer,
Dilan Pathirana,
Paul Stapor,
Polina Lakrisenko,
Dantong Wang,
Elba Raimúndez,
Simon Merkt,
Leonard Schmiester,
Philipp Städter,
Stephan Grein,
Erika Dudkin,
Domagoj Doresic,
Daniel Weindl,
Jan Hasenauer
Abstract:
Mechanistic models are important tools to describe and understand biological processes. However, they typically rely on unknown parameters, the estimation of which can be challenging for large and complex systems. We present pyPESTO, a modular framework for systematic parameter estimation, with scalable algorithms for optimization and uncertainty quantification. While tailored to ordinary differen…
▽ More
Mechanistic models are important tools to describe and understand biological processes. However, they typically rely on unknown parameters, the estimation of which can be challenging for large and complex systems. We present pyPESTO, a modular framework for systematic parameter estimation, with scalable algorithms for optimization and uncertainty quantification. While tailored to ordinary differential equation problems, pyPESTO is broadly applicable to black-box parameter estimation problems. Besides own implementations, it provides a unified interface to various popular simulation and inference methods. pyPESTO is implemented in Python, open-source under a 3-Clause BSD license. Code and documentation are available on GitHub (https://github.com/icb-dcm/pypesto).
△ Less
Submitted 2 May, 2023;
originally announced May 2023.
-
Diffusion Models in Bioinformatics: A New Wave of Deep Learning Revolution in Action
Authors:
Zhiye Guo,
Jian Liu,
Yanli Wang,
Mengrui Chen,
Duolin Wang,
Dong Xu,
Jianlin Cheng
Abstract:
Denoising diffusion models have emerged as one of the most powerful generative models in recent years. They have achieved remarkable success in many fields, such as computer vision, natural language processing (NLP), and bioinformatics. Although there are a few excellent reviews on diffusion models and their applications in computer vision and NLP, there is a lack of an overview of their applicati…
▽ More
Denoising diffusion models have emerged as one of the most powerful generative models in recent years. They have achieved remarkable success in many fields, such as computer vision, natural language processing (NLP), and bioinformatics. Although there are a few excellent reviews on diffusion models and their applications in computer vision and NLP, there is a lack of an overview of their applications in bioinformatics. This review aims to provide a rather thorough overview of the applications of diffusion models in bioinformatics to aid their further development in bioinformatics and computational biology. We start with an introduction of the key concepts and theoretical foundations of three cornerstone diffusion modeling frameworks (denoising diffusion probabilistic models, noise-conditioned scoring networks, and stochastic differential equations), followed by a comprehensive description of diffusion models employed in the different domains of bioinformatics, including cryo-EM data enhancement, single-cell data analysis, protein design and generation, drug and small molecule design, and protein-ligand interaction. The review is concluded with a summary of the potential new development and applications of diffusion models in bioinformatics.
△ Less
Submitted 13 February, 2023;
originally announced February 2023.
-
Learning from pseudo-labels: deep networks improve consistency in longitudinal brain volume estimation
Authors:
Geng Zhan,
Dongang Wang,
Mariano Cabezas,
Lei Bai,
Kain Kyle,
Wanli Ouyang,
Michael Barnett,
Chenyu Wang
Abstract:
Brain atrophy is an important biomarker for monitoring neurodegeneration and disease progression in conditions such as multiple sclerosis (MS). An accurate and robust quantitative measurement of brain volume change is paramount for translational research and clinical applications. This paper presents a deep learning based method, DeepBVC, for longitudinal brain volume change measurement using 3D T…
▽ More
Brain atrophy is an important biomarker for monitoring neurodegeneration and disease progression in conditions such as multiple sclerosis (MS). An accurate and robust quantitative measurement of brain volume change is paramount for translational research and clinical applications. This paper presents a deep learning based method, DeepBVC, for longitudinal brain volume change measurement using 3D T1-weighted MRI scans. Trained with the intermediate outputs from SIENA, DeepBVC is designed to take into account the variance caused by different scanners and acquisition protocols. Compared with SIENA, DeepBVC demonstrates higher consistency in terms of volume change estimation across multiple time points in MS subjects; and greater stability and superior performance in scan-rescan experiments. Moreover, the results also show that DeepBVC is insensitive to acquisition variance in terms of imaging contrast, voxel resolution, random bias field and signal-to-noise ratio. Measurement robustness, automation and processing speed suggest a broad potential of DeepBVC in both research and clinical quantitative neuroimaging applications.
△ Less
Submitted 8 February, 2023;
originally announced February 2023.
-
On Pre-trained Language Models for Antibody
Authors:
Danqing Wang,
Fei Ye,
Hao Zhou
Abstract:
Antibodies are vital proteins offering robust protection for the human body from pathogens. The development of general protein and antibody-specific pre-trained language models both facilitate antibody prediction tasks. However, there have been limited studies that comprehensively explore the representation capability of distinct pre-trained language models on different antibody tasks. To investig…
▽ More
Antibodies are vital proteins offering robust protection for the human body from pathogens. The development of general protein and antibody-specific pre-trained language models both facilitate antibody prediction tasks. However, there have been limited studies that comprehensively explore the representation capability of distinct pre-trained language models on different antibody tasks. To investigate the problem, we aim to answer several key questions in this paper, such as how pre-trained language models perform in antibody tasks with different specificity and how introducing specific biological mechanisms to the pre-training process can benefit the model. Additionally, we evaluate if the learned antibody pre-trained representations can be applied to real-world antibody problems, like drug discovery and immune process understanding. Previously, no benchmark available largely hindered the study to answer these questions. To aid in our investigation, we provide an AnTibody Understanding Evaluation (ATUE) benchmark. We comprehensively evaluate the performance of protein pre-trained language models by empirical study along with conclusions and new insights. Our ATUE and code are released at https://github.com/dqwang122/EATLM.
△ Less
Submitted 1 March, 2023; v1 submitted 28 January, 2023;
originally announced January 2023.
-
Accelerating Antimicrobial Peptide Discovery with Latent Structure
Authors:
Danqing Wang,
Zeyu Wen,
Fei Ye,
Lei Li,
Hao Zhou
Abstract:
Antimicrobial peptides (AMPs) are promising therapeutic approaches against drug-resistant pathogens. Recently, deep generative models are used to discover new AMPs. However, previous studies mainly focus on peptide sequence attributes and do not consider crucial structure information. In this paper, we propose a latent sequence-structure model for designing AMPs (LSSAMP). LSSAMP exploits multi-sca…
▽ More
Antimicrobial peptides (AMPs) are promising therapeutic approaches against drug-resistant pathogens. Recently, deep generative models are used to discover new AMPs. However, previous studies mainly focus on peptide sequence attributes and do not consider crucial structure information. In this paper, we propose a latent sequence-structure model for designing AMPs (LSSAMP). LSSAMP exploits multi-scale vector quantization in the latent space to represent secondary structures (e.g. alpha helix and beta sheet). By sampling in the latent space, LSSAMP can simultaneously generate peptides with ideal sequence attributes and secondary structures. Experimental results show that the peptides generated by LSSAMP have a high probability of antimicrobial activity. Our wet laboratory experiments verified that two of the 21 candidates exhibit strong antimicrobial activity. The code is released at https://github.com/dqwang122/LSSAMP.
△ Less
Submitted 20 August, 2023; v1 submitted 28 November, 2022;
originally announced December 2022.
-
Bridging the gap between target-based and cell-based drug discovery with a graph generative multi-task model
Authors:
Fan Hu,
Dongqi Wang,
Huazhen Huang,
Yishen Hu,
Peng Yin
Abstract:
Drug discovery is vitally important for protecting human against disease. Target-based screening is one of the most popular methods to develop new drugs in the past several decades. This method efficiently screens candidate drugs inhibiting target protein in vitro, but it often fails due to inadequate activity of the selected drugs in vivo. Accurate computational methods are needed to bridge this…
▽ More
Drug discovery is vitally important for protecting human against disease. Target-based screening is one of the most popular methods to develop new drugs in the past several decades. This method efficiently screens candidate drugs inhibiting target protein in vitro, but it often fails due to inadequate activity of the selected drugs in vivo. Accurate computational methods are needed to bridge this gap. Here, we propose a novel graph multi task deep learning model to identify compounds carrying both target inhibitory and cell active (MATIC) properties. On a carefully curated SARS-CoV-2 dataset, the proposed MATIC model shows advantages comparing with traditional method in screening effective compounds in vivo. Next, we explored the model interpretability and found that the learned features for target inhibition (in vitro) or cell active (in vivo) tasks are different with molecular property correlations and atom functional attentions. Based on these findings, we utilized a monte carlo based reinforcement learning generative model to generate novel multi-property compounds with both in vitro and in vivo efficacy, thus bridging the gap between target-based and cell-based drug discovery.
△ Less
Submitted 8 August, 2022;
originally announced August 2022.
-
Computer-Aided Extraction of Select MRI Markers of Cerebral Small Vessel Disease: A Systematic Review
Authors:
Jiyang Jiang,
Dadong Wang,
Yang Song,
Perminder S. Sachdev,
Wei Wen
Abstract:
Cerebral small vessel disease (CSVD) is a major vascular contributor to cognitive impairment in ageing, including dementias. Imaging remains the most promising method for in vivo studies of CSVD. To replace the subjective and laborious visual rating approaches, emerging studies have applied state-of-the-art artificial intelligence to extract imaging biomarkers of CSVD from MRI scans. We aimed to s…
▽ More
Cerebral small vessel disease (CSVD) is a major vascular contributor to cognitive impairment in ageing, including dementias. Imaging remains the most promising method for in vivo studies of CSVD. To replace the subjective and laborious visual rating approaches, emerging studies have applied state-of-the-art artificial intelligence to extract imaging biomarkers of CSVD from MRI scans. We aimed to summarise published computer-aided methods to examine three imaging biomarkers of CSVD, namely cerebral microbleeds (CMB), dilated perivascular spaces (PVS), and lacunes of presumed vascular origin. Seventy-one classical image processing, classical machine learning, and deep learning studies were identified. CMB and PVS have been better studied, compared to lacunes. While good performance metrics have been achieved in local test datasets, there have not been generalisable pipelines validated in different research or clinical cohorts. Transfer learning and weak supervision techniques have been applied to accommodate the limitations in training data. Future studies could consider pooling data from multiple sources to increase diversity, and validating the performance of the methods using both image processing metrics and associations with clinical measures.
△ Less
Submitted 4 April, 2022;
originally announced April 2022.
-
Modeling interaction of Glioma cells and CAR T-cells considering multiple CAR T-cells bindings
Authors:
Runpeng Li,
Prativa Sahoo,
Dongrui Wang,
Qixuan Wang,
Christine E. Brown,
Russell C. Rockne,
Heyrim Cho
Abstract:
Chimeric antigen receptor (CAR) T-cell based immunotherapy has shown its potential in treating blood cancers, and its application to solid tumors is currently being extensively investigated. For glioma brain tumors, various CAR T-cell targets include IL13Ra2, EGFRvIII, HER2, EphA2, GD2, B7-H3, and chlorotoxin. In this work, we are interested in developing a mathematical model of IL13Ra2 targeting…
▽ More
Chimeric antigen receptor (CAR) T-cell based immunotherapy has shown its potential in treating blood cancers, and its application to solid tumors is currently being extensively investigated. For glioma brain tumors, various CAR T-cell targets include IL13Ra2, EGFRvIII, HER2, EphA2, GD2, B7-H3, and chlorotoxin. In this work, we are interested in developing a mathematical model of IL13Ra2 targeting CAR T-cells for treating glioma. We focus on extending the work of Kuznetsov et al. (1994) by considering binding of multiple CAR T-cells to a single glioma cell, and the dynamics of these multi-cellular conjugates. Our model more accurately describes experimentally observed CAR T-cell killing assay data than a model which does not consider cell binding. Moreover, we derive conditions in the CAR T-cell expansion rate that determines treatment success or failure. Finally, we show that our model captures distinct CAR T-cell killing dynamics at low, medium, and high antigen receptor densities in patient-derived brain tumor cells.
△ Less
Submitted 18 January, 2022;
originally announced January 2022.
-
Deriving Autism Spectrum Disorder Functional Networks from RS-FMRI Data using Group ICA and Dictionary Learning
Authors:
Xin Yang,
Ning Zhang,
Donglin Wang
Abstract:
The objective of this study is to derive functional networks for the autism spectrum disorder (ASD) population using the group ICA and dictionary learning model together and to classify ASD and typically developing (TD) participants using the functional connectivity calculated from the derived functional networks. In our experiments, the ASD functional networks were derived from resting-state func…
▽ More
The objective of this study is to derive functional networks for the autism spectrum disorder (ASD) population using the group ICA and dictionary learning model together and to classify ASD and typically developing (TD) participants using the functional connectivity calculated from the derived functional networks. In our experiments, the ASD functional networks were derived from resting-state functional magnetic resonance imaging (rs-fMRI) data. We downloaded a total of 120 training samples, including 58 ASD and 62 TD participants, which were obtained from the public repository: Autism Brain Imaging Data Exchange I (ABIDE I). Our methodology and results have five main parts. First, we utilize a group ICA model to extract functional networks from the ASD group and rank the top 20 regions of interest (ROIs). Second, we utilize a dictionary learning model to extract functional networks from the ASD group and rank the top 20 ROIs. Third, we merged the 40 selected ROIs from the two models together as the ASD functional networks. Fourth, we generate three corresponding masks based on the 20 selected ROIs from group ICA, the 20 ROIs selected from dictionary learning, and the 40 combined ROIs selected from both. Finally, we extract ROIs for all training samples using the above three masks, and the calculated functional connectivity was used as features for ASD and TD classification. The classification results showed that the functional networks derived from ICA and dictionary learning together outperform those derived from a single ICA model or a single dictionary learning model.
△ Less
Submitted 7 June, 2021;
originally announced June 2021.
-
A Novel Framework Integrating AI Model and Enzymological Experiments Promotes Identification of SARS-CoV-2 3CL Protease Inhibitors and Activity-based Probe
Authors:
Fan Hu,
Lei Wang,
Yishen Hu,
Dongqi Wang,
Weijie Wang,
Jianbing Jiang,
Nan Li,
Peng Yin
Abstract:
The identification of protein-ligand interaction plays a key role in biochemical research and drug discovery. Although deep learning has recently shown great promise in discovering new drugs, there remains a gap between deep learning-based and experimental approaches. Here we propose a novel framework, named AIMEE, integrating AI Model and Enzymology Experiments, to identify inhibitors against 3CL…
▽ More
The identification of protein-ligand interaction plays a key role in biochemical research and drug discovery. Although deep learning has recently shown great promise in discovering new drugs, there remains a gap between deep learning-based and experimental approaches. Here we propose a novel framework, named AIMEE, integrating AI Model and Enzymology Experiments, to identify inhibitors against 3CL protease of SARS-CoV-2, which has taken a significant toll on people across the globe. From a bioactive chemical library, we have conducted two rounds of experiments and identified six novel inhibitors with a hit rate of 29.41%, and four of them showed an IC50 value less than 3 μM. Moreover, we explored the interpretability of the central model in AIMEE, mapping the deep learning extracted features to domain knowledge of chemical properties. Based on this knowledge, a commercially available compound was selected and proven to be an activity-based probe of 3CLpro. This work highlights the great potential of combining deep learning models and biochemical experiments for intelligent iteration and expanding the boundaries of drug discovery.
△ Less
Submitted 29 May, 2021;
originally announced May 2021.
-
Identify Hidden Spreaders of Pandemic over Contact Tracing Networks
Authors:
Shuhong Huang,
Jiachen Sun,
Ling Feng,
Jiarong Xie,
Dashun Wang,
Yanqing Hu
Abstract:
The COVID-19 infection cases have surged globally, causing devastations to both the society and economy. A key factor contributing to the sustained spreading is the presence of a large number of asymptomatic or hidden spreaders, who mix among the susceptible population without being detected or quarantined. Here we propose an effective non-pharmacological intervention method of detecting the asymp…
▽ More
The COVID-19 infection cases have surged globally, causing devastations to both the society and economy. A key factor contributing to the sustained spreading is the presence of a large number of asymptomatic or hidden spreaders, who mix among the susceptible population without being detected or quarantined. Here we propose an effective non-pharmacological intervention method of detecting the asymptomatic spreaders in contact-tracing networks, and validated it on the empirical COVID-19 spreading network in Singapore. We find that using pure physical spreading equations, the hidden spreaders of COVID-19 can be identified with remarkable accuracy. Specifically, based on the unique characteristics of COVID-19 spreading dynamics, we propose a computational framework capturing the transition probabilities among different infectious states in a network, and extend it to an efficient algorithm to identify asymptotic individuals. Our simulation results indicate that a screening method using our prediction outperforms machine learning algorithms, e.g. graph neural networks, that are designed as baselines in this work, as well as random screening of infection's closest contacts widely used by China in its early outbreak. Furthermore, our method provides high precision even with incomplete information of the contract-tracing networks. Our work can be of critical importance to the non-pharmacological interventions of COVID-19, especially with increasing adoptions of contact tracing measures using various new technologies. Beyond COVID-19, our framework can be useful for other epidemic diseases that also feature asymptomatic spreading
△ Less
Submitted 16 March, 2021;
originally announced March 2021.
-
General Mechanism of Evolution Shared by Proteins and Words
Authors:
Li-Min Wang,
Hsing-Yi Lai,
Sun-Ting Tsai,
Chen Siang Ng,
Shan-Jyun Wu,
Meng-Xue Tsai,
Yi-Ching Su,
Daw-Wei Wang,
Tzay-Ming Hong
Abstract:
Complex systems, such as life and languages, are governed by principles of evolution. The analogy and comparison between biology and linguistics\cite{alphafold2, RoseTTAFold, lang_virus, cell language, faculty1, language of gene, Protein linguistics, dictionary, Grammar of pro_dom, complexity, genomics_nlp, InterPro, language modeling, Protein language modeling} provide a computational foundation…
▽ More
Complex systems, such as life and languages, are governed by principles of evolution. The analogy and comparison between biology and linguistics\cite{alphafold2, RoseTTAFold, lang_virus, cell language, faculty1, language of gene, Protein linguistics, dictionary, Grammar of pro_dom, complexity, genomics_nlp, InterPro, language modeling, Protein language modeling} provide a computational foundation for characterizing and analyzing protein sequences, human corpora, and their evolution. However, no general mathematical formula has been proposed so far to illuminate the origin of quantitative hallmarks shared by life and language. Here we show several new statistical relationships shared by proteins and words, which inspire us to establish a general mechanism of evolution with explicit formulations that can incorporate both old and new characteristics. We found natural selection can be quantified via the entropic formulation by the principle of least effort to determine the sequence variation that survives in evolution. Besides, the origin of power law behavior and how changes in the environment stimulate the emergence of new proteins and words can also be explained via the introduction of function connection network. Our results demonstrate not only the correspondence between genetics and linguistics over their different hierarchies but also new fundamental physical properties for the evolution of complex adaptive systems. We anticipate our statistical tests can function as quantitative criteria to examine whether an evolution theory of sequence is consistent with the regularity of real data. In the meantime, their correspondence broadens the bridge to exchange existing knowledge, spurs new interpretations, and opens Pandora's box to release several potentially revolutionary challenges. For example, does linguistic arbitrariness conflict with the dogma that structure determines function?
△ Less
Submitted 16 December, 2022; v1 submitted 28 December, 2020;
originally announced December 2020.
-
Cox-nnet v2.0: improved neural-network based survival prediction extended to large-scale EMR dataset
Authors:
Di Wang,
Kevin He,
Lana X Garmire
Abstract:
Cox-nnet is a neural-network based prognosis prediction method, originally applied to genomics data. Here we propose the version 2 of Cox-nnet, with significant improvement on efficiency and interpretability, making it suitable to predict prognosis based on large-scale electronic medical records (EMR) datasets. We also add permutation-based feature importance scores and the direction of feature co…
▽ More
Cox-nnet is a neural-network based prognosis prediction method, originally applied to genomics data. Here we propose the version 2 of Cox-nnet, with significant improvement on efficiency and interpretability, making it suitable to predict prognosis based on large-scale electronic medical records (EMR) datasets. We also add permutation-based feature importance scores and the direction of feature coefficients. Applying on an EMR dataset of OPTN kidney transplantation, Cox-nnet v2.0 reduces the training time of Cox-nnet up to 32 folds (n=10,000) and achieves better prediction accuracy than Cox-PH (p<0.05). Availability and implementation: Cox-nnet v2.0 is freely available to the public at https://github.com/lanagarmire/Cox-nnet-v2.0
△ Less
Submitted 9 September, 2020;
originally announced September 2020.
-
Identification of Neuronal Polarity by Node-Based Machine Learning
Authors:
Chen-Zhi Su,
Kuan-Ting Chou,
Hsuan-Pei Huang,
Chung-Chuan Lo,
Daw-Wei Wang
Abstract:
Identify the directions of signal flows in neural networks is one of the most important stages for understanding the intricate information dynamics of a living brain. Using a dataset of 213 projection neurons distributed in different regions of Drosophila brain, we develop a powerful machine learning algorithm: node-based polarity identifier of neurons (NPIN). The proposed model is trained by noda…
▽ More
Identify the directions of signal flows in neural networks is one of the most important stages for understanding the intricate information dynamics of a living brain. Using a dataset of 213 projection neurons distributed in different regions of Drosophila brain, we develop a powerful machine learning algorithm: node-based polarity identifier of neurons (NPIN). The proposed model is trained by nodal information only and includes both Soma Features (which contain spatial information from a given node to a soma) and Local Features (which contain morphological information of a given node). After including the spatial correlations between nodal polarities, our NPIN provided extremely high accuracy (>96.0%) for the classification of neuronal polarity, even for complex neurons with more than two dendrite/axon clusters. Finally, we further apply NPIN to classify the neuronal polarity of the blowfly, which has much less neuronal data available. Our results demonstrate that NPIN is a powerful tool to identify the neuronal polarity of insects and to map out the signal flows in the brain's neural networks.
△ Less
Submitted 22 June, 2020;
originally announced June 2020.
-
Targeted Pandemic Containment Through Identifying Local Contact Network Bottlenecks
Authors:
Shenghao Yang,
Priyabrata Senapati,
Di Wang,
Chris T. Bauch,
Kimon Fountoulakis
Abstract:
Decision-making about pandemic mitigation often relies upon simulation modelling. Models of disease transmission through networks of contacts--between individuals or between population centres--are increasingly used for these purposes. Real-world contact networks are rich in structural features that influence infection transmission, such as tightly-knit local communities that are weakly connected…
▽ More
Decision-making about pandemic mitigation often relies upon simulation modelling. Models of disease transmission through networks of contacts--between individuals or between population centres--are increasingly used for these purposes. Real-world contact networks are rich in structural features that influence infection transmission, such as tightly-knit local communities that are weakly connected to one another. In this paper, we propose a new flow-based edge-betweenness centrality method for detecting bottleneck edges that connect nodes in contact networks. In particular, we utilize convex optimization formulations based on the idea of diffusion with p-norm network flow. Using simulation models of COVID-19 transmission through real network data at both individual and county levels, we demonstrate that targeting bottleneck edges identified by the proposed method reduces the number of infected cases by up to 10% more than state-of-the-art edge-betweenness methods. Furthermore, the proposed method is orders of magnitude faster than existing methods.
△ Less
Submitted 21 August, 2021; v1 submitted 12 June, 2020;
originally announced June 2020.
-
Opportunities for multiscale computational modelling of serotonergic drug effects in Alzheimer's disease
Authors:
Alok Joshi,
Da-Hui Wang,
Steven Watterson,
Paula L. McClean,
Chandan K. Behera,
Trevor Sharp,
KongFatt Wong-Lin
Abstract:
Alzheimer's disease (AD) is an age-specific neurodegenerative disease that compromises cognitive functioning and impacts the quality of life of an individual. Pathologically, AD is characterised by abnormal accumulation of beta-amyloid (A$β$) and hyperphosphorylated tau protein. Despite research advances over the last few decades, there is currently still no cure for AD. Although, medications are…
▽ More
Alzheimer's disease (AD) is an age-specific neurodegenerative disease that compromises cognitive functioning and impacts the quality of life of an individual. Pathologically, AD is characterised by abnormal accumulation of beta-amyloid (A$β$) and hyperphosphorylated tau protein. Despite research advances over the last few decades, there is currently still no cure for AD. Although, medications are available to control some behavioural symptoms and slow the disease's progression, most prescribed medications are based on cholinesterase inhibitors. Over the last decade, there has been increased attention towards novel drugs, targeting alternative neurotransmitter pathways, particularly those targeting serotonergic (5-HT) system. In this review, we focused on 5-HT receptor (5-HTR) mediated signalling and drugs that target these receptors. These pathways regulate key proteins and kinases such as GSK-3 that are associated with abnormal levels of A$β$ and tau in AD. We then review computational studies related to 5-HT signalling pathways with the potential for providing deeper understanding of AD pathologies. In particular, we suggest that multiscale and multilevel modelling approaches could potentially provide new insights into AD mechanisms, and towards discovering novel 5-HTR based therapeutic targets.
△ Less
Submitted 27 May, 2020; v1 submitted 5 May, 2020;
originally announced May 2020.
-
CovidSens: A Vision on Reliable Social Sensing for COVID-19
Authors:
Md Tahmid Rashid,
Dong Wang
Abstract:
With the spiraling pandemic of the Coronavirus Disease 2019 (COVID-19), it has becoming inherently important to disseminate accurate and timely information about the disease. Due to the ubiquity of Internet connectivity and smart devices, social sensing is emerging as a dynamic AI-driven sensing paradigm to extract real-time observations from online users. In this paper, we propose CovidSens, a vi…
▽ More
With the spiraling pandemic of the Coronavirus Disease 2019 (COVID-19), it has becoming inherently important to disseminate accurate and timely information about the disease. Due to the ubiquity of Internet connectivity and smart devices, social sensing is emerging as a dynamic AI-driven sensing paradigm to extract real-time observations from online users. In this paper, we propose CovidSens, a vision of social sensing based risk alert systems to spontaneously obtain and analyze social data to infer COVID-19 propagation. CovidSens can actively help to keep the general public informed about the COVID-19 spread and identify risk-prone areas. The CovidSens concept is motivated by three observations: 1) people actively share their experience of COVID-19 via online social media, 2) official warning channels and news agencies are relatively slower than people reporting on social media, and 3) online users are frequently equipped with powerful mobile devices that can perform data processing and analytics. We envision unprecedented opportunities to leverage posts generated by ordinary people to build real-time sensing and analytic system for gathering and circulating COVID-19 propagation data. Specifically, the vision of CovidSens attempts to answer the questions: How to distill reliable information on COVID-19 with prevailing rumors and misinformation? How to inform the general public about the state of the spread timely and effectively? How to leverage the computational power on edge devices to construct fully integrated edge-based social sensing platforms? In this vision paper, we discuss the roles of CovidSens and identify potential challenges in developing reliable social sensing based risk alert systems. We envision that approaches originating from multiple disciplines can be effective in addressing the challenges. Finally, we outline a few research directions for future work in CovidSens.
△ Less
Submitted 23 May, 2020; v1 submitted 9 April, 2020;
originally announced April 2020.
-
PEtab -- interoperable specification of parameter estimation problems in systems biology
Authors:
Leonard Schmiester,
Yannik Schälte,
Frank T. Bergmann,
Tacio Camba,
Erika Dudkin,
Janine Egert,
Fabian Fröhlich,
Lara Fuhrmann,
Adrian L. Hauber,
Svenja Kemmer,
Polina Lakrisenko,
Carolin Loos,
Simon Merkt,
Wolfgang Müller,
Dilan Pathirana,
Elba Raimúndez,
Lukas Refisch,
Marcus Rosenblatt,
Paul L. Stapor,
Philipp Städter,
Dantong Wang,
Franz-Georg Wieland,
Julio R. Banga,
Jens Timmer,
Alejandro F. Villaverde
, et al. (4 additional authors not shown)
Abstract:
Reproducibility and reusability of the results of data-based modeling studies are essential. Yet, there has been -- so far -- no broadly supported format for the specification of parameter estimation problems in systems biology. Here, we introduce PEtab, a format which facilitates the specification of parameter estimation problems using Systems Biology Markup Language (SBML) models and a set of ta…
▽ More
Reproducibility and reusability of the results of data-based modeling studies are essential. Yet, there has been -- so far -- no broadly supported format for the specification of parameter estimation problems in systems biology. Here, we introduce PEtab, a format which facilitates the specification of parameter estimation problems using Systems Biology Markup Language (SBML) models and a set of tab-separated value files describing the observation model and experimental data as well as parameters to be estimated. We already implemented PEtab support into eight well-established model simulation and parameter estimation toolboxes with hundreds of users in total. We provide a Python library for validation and modification of a PEtab problem and currently 20 example parameter estimation problems based on recent studies. Specifications of PEtab, the PEtab Python library, as well as links to examples, and all supporting software tools are available at https://github.com/PEtab-dev/PEtab, a snapshot is available at https://doi.org/10.5281/zenodo.3732958. All original content is available under permissive licenses.
△ Less
Submitted 7 August, 2020; v1 submitted 2 April, 2020;
originally announced April 2020.
-
Frontoparietal Connectivity Neurofeedback Training for Promotion of Working Memory: An fNIRS Study in Healthy Male Participants
Authors:
Meiyun Xia,
Pengfei Xu,
Yuanbin Yang,
Wenyu Jiang,
Zehua Wang,
Xiaolei Gu,
Mingxi Yang,
Deyu Li,
Shuyu Li,
Guijun Dong,
Ling Wang,
Daifa Wang
Abstract:
Neurofeedback cognitive training is a promising tool used to promote cognitive functions effectively and efficiently. In this study, we investigated a novel functional near-infrared spectroscopy (fNIRS)-based frontoparietal functional connectivity (FC) neurofeedback training paradigm related to working memory, involving healthy adults. Compared with conventional cognitive training studies, we chos…
▽ More
Neurofeedback cognitive training is a promising tool used to promote cognitive functions effectively and efficiently. In this study, we investigated a novel functional near-infrared spectroscopy (fNIRS)-based frontoparietal functional connectivity (FC) neurofeedback training paradigm related to working memory, involving healthy adults. Compared with conventional cognitive training studies, we chose the frontoparietal network, a key brain region for cognitive function modulation, as neurofeedback, yielding a strong targeting effect. In the experiment, 10 participants (test group) received three cognitive training sessions of 15 min using fNIRS-based frontoparietal FC as neurofeedback, and another 10 participants served as the control group. Frontoparietal FC was significantly increased in the test group (p D 0.03), and the cognitive functions (memory and attention) were significantly promoted compared with the control group (accuracy of 3-back test: p D 0.0005, reaction time of 3-back test: p D 0.0009). After additional validations on long-term training effect and on different patient populations, the proposed method exhibited considerable potential to be developed as a fast, effective, and widespread training approach for cognitive function enhancement.
△ Less
Submitted 2 June, 2021; v1 submitted 31 March, 2020;
originally announced March 2020.
-
Propagation analysis and prediction of the COVID-19
Authors:
Lixiang Li,
Zihang Yang,
Zhongkai Dang,
Cui Meng,
Jingze Huang,
Hao Tian Meng,
Deyu Wang,
Guanhua Chen,
Jiaxuan Zhang,
Haipeng Peng
Abstract:
Based on the official data modeling, this paper studies the transmission process of the Corona Virus Disease 2019 (COVID-19). The error between the model and the official data curve is within 3%. At the same time, it realized forward prediction and backward inference of the epidemic situation, and the relevant analysis help relevant countries to make decisions.
Based on the official data modeling, this paper studies the transmission process of the Corona Virus Disease 2019 (COVID-19). The error between the model and the official data curve is within 3%. At the same time, it realized forward prediction and backward inference of the epidemic situation, and the relevant analysis help relevant countries to make decisions.
△ Less
Submitted 15 March, 2020;
originally announced March 2020.
-
A Novel Decision Tree for Depression Recognition in Speech
Authors:
Zhenyu Liu,
Dongyu Wang,
Lan Zhang,
Bin Hu
Abstract:
Depression is a common mental disorder worldwide which causes a range of serious outcomes. The diagnosis of depression relies on patient-reported scales and psychiatrist interview which may lead to subjective bias. In recent years, more and more researchers are devoted to depression recognition in speech , which may be an effective and objective indicator. This study proposes a new speech segment…
▽ More
Depression is a common mental disorder worldwide which causes a range of serious outcomes. The diagnosis of depression relies on patient-reported scales and psychiatrist interview which may lead to subjective bias. In recent years, more and more researchers are devoted to depression recognition in speech , which may be an effective and objective indicator. This study proposes a new speech segment fusion method based on decision tree to improve the depression recognition accuracy and conducts a validation on a sample of 52 subjects (23 depressed patients and 29 healthy controls). The recognition accuracy are 75.8% and 68.5% for male and female respectively on gender-dependent models. It can be concluded from the data that the proposed decision tree model can improve the depression classification performance.
△ Less
Submitted 22 February, 2020;
originally announced February 2020.
-
Measuring Impact of Climate Change on Tree Species: analysis of JSDM on FIA data
Authors:
Hyun Choi,
Ali Sadeghian,
Sergio Marconi,
Ethan White,
Daisy Zhe Wang
Abstract:
One of the first beings affected by changes in the climate are trees, one of our most vital resources. In this study tree species interaction and the response to climate in different ecological environments is observed by applying a joint species distribution model to different ecological domains in the United States. Joint species distribution models are useful to learn inter-species relationship…
▽ More
One of the first beings affected by changes in the climate are trees, one of our most vital resources. In this study tree species interaction and the response to climate in different ecological environments is observed by applying a joint species distribution model to different ecological domains in the United States. Joint species distribution models are useful to learn inter-species relationships and species response to the environment. The climates' impact on the tree species is measured through species abundance in an area. We compare the model's performance across all ecological domains and study the sensitivity of the climate variables. With the prediction of abundances, tree species populations can be predicted in the future and measure the impact of climate change on tree populations.
△ Less
Submitted 10 October, 2019;
originally announced October 2019.
-
Critical slowing down and attractive manifold: a mechanism for dynamic robustness in yeast cell-cycle process
Authors:
Yao Zhao,
Dedi Wang,
Zhiwen Zhang,
Ying Lu,
Xiaojing Yang,
Qi Ouyang,
Chao Tang,
Fangting Li
Abstract:
The biological processes that execute complex multiple functions, such as cell cycle, must ensure the order of sequential events and keep the dynamic robustness against various fluctuations. Here, we examine the dynamic mechanism and the fundamental structure to achieve these properties in the cell-cycle process of budding yeast Saccharomyces cerevisiae. We show that the budding yeast cell-cycle p…
▽ More
The biological processes that execute complex multiple functions, such as cell cycle, must ensure the order of sequential events and keep the dynamic robustness against various fluctuations. Here, we examine the dynamic mechanism and the fundamental structure to achieve these properties in the cell-cycle process of budding yeast Saccharomyces cerevisiae. We show that the budding yeast cell-cycle process behaves like an excitable system containing three well-coupled saddle-node bifurcations to execute DNA replication and mitosis events. The yeast cell-cycle regulatory network can be separated into G1/S phase module, early M module and late M phase module, where the positive feedbacks in each module and the interactions among the modules play important role. If the cell-cycle process operates near the critical points of the saddle-node bifurcations, there is a critical slowing down or ghost effect. This can provide the cell-cycle process with a sufficient duration for each event and an attractive manifold for the state checking of the completion of DNA replication and mitosis; moreover, the fluctuation in the early module/event is forbidden to transmit to the latter module/event. Our results suggest both a fundamental structure of cell-cycle regulatory network and a hint for the evolution of eukaryotic cell-cycle processes, from the dynamic checking mechanism to the molecule checkpoint pathway.
△ Less
Submitted 23 November, 2018;
originally announced November 2018.
-
Deep Reinforcement Learning of Cell Movement in the Early Stage of C. elegans Embryogenesis
Authors:
Zi Wang,
Dali Wang,
Chengcheng Li,
Yichi Xu,
Husheng Li,
Zhirong Bao
Abstract:
Cell movement in the early phase of C. elegans development is regulated by a highly complex process in which a set of rules and connections are formulated at distinct scales. Previous efforts have shown that agent-based, multi-scale modeling systems can integrate physical and biological rules and provide new avenues to study developmental systems. However, the application of these systems to model…
▽ More
Cell movement in the early phase of C. elegans development is regulated by a highly complex process in which a set of rules and connections are formulated at distinct scales. Previous efforts have shown that agent-based, multi-scale modeling systems can integrate physical and biological rules and provide new avenues to study developmental systems. However, the application of these systems to model cell movement is still challenging and requires a comprehensive understanding of regulation networks at the right scales. Recent developments in deep learning and reinforcement learning provide an unprecedented opportunity to explore cell movement using 3D time-lapse images. We present a deep reinforcement learning approach within an ABM system to characterize cell movement in C. elegans embryogenesis. Our modeling system captures the complexity of cell movement patterns in the embryo and overcomes the local optimization problem encountered by traditional rule-based, ABM that uses greedy algorithms. We tested our model with two real developmental processes: the anterior movement of the Cpaaa cell via intercalation and the rearrangement of the left-right asymmetry. In the first case, model results showed that Cpaaa's intercalation is an active directional cell movement caused by the continuous effects from a longer distance, as opposed to a passive movement caused by neighbor cell movements. This is because the learning-based simulation found that a passive movement model could not lead Cpaaa to the predefined destination. In the second case, a leader-follower mechanism well explained the collective cell movement pattern. These results showed that our approach to introduce deep reinforcement learning into ABM can test regulatory mechanisms by exploring cell migration paths in a reverse engineering perspective. This model opens new doors to explore large datasets generated by live imaging.
△ Less
Submitted 2 March, 2018; v1 submitted 14 January, 2018;
originally announced January 2018.
-
MotifMark: Finding Regulatory Motifs in DNA Sequences
Authors:
Hamid Reza Hassanzadeh,
Pushkar Kolhe,
Charles L. Isbell,
May D. Wang
Abstract:
The interaction between proteins and DNA is a key driving force in a significant number of biological processes such as transcriptional regulation, repair, recombination, splicing, and DNA modification. The identification of DNA-binding sites and the specificity of target proteins in binding to these regions are two important steps in understanding the mechanisms of these biological activities. A…
▽ More
The interaction between proteins and DNA is a key driving force in a significant number of biological processes such as transcriptional regulation, repair, recombination, splicing, and DNA modification. The identification of DNA-binding sites and the specificity of target proteins in binding to these regions are two important steps in understanding the mechanisms of these biological activities. A number of high-throughput technologies have recently emerged that try to quantify the affinity between proteins and DNA motifs. Despite their success, these technologies have their own limitations and fall short in precise characterization of motifs, and as a result, require further downstream analysis to extract useful and interpretable information from a haystack of noisy and inaccurate data. Here we propose MotifMark, a new algorithm based on graph theory and machine learning, that can find binding sites on candidate probes and rank their specificity in regard to the underlying transcription factor. We developed a pipeline to analyze experimental data derived from compact universal protein binding microarrays and benchmarked it against two of the most accurate motif search methods. Our results indicate that MotifMark can be a viable alternative technique for prediction of motif from protein binding microarrays and possibly other related high-throughput techniques.
△ Less
Submitted 4 May, 2017;
originally announced May 2017.
-
Modular knowledge systems accelerate human migration in asymmetric random environments
Authors:
Dong Wang,
Michael W. Deem
Abstract:
Migration is a key mechanism for expansion of communities. In spatially heterogeneous environments, rapidly gaining knowledge about the local environment is key to the evolutionary success of a migrating population. For historical human migration, environmental heterogeneity was naturally asymmetric in the north-south (NS) and east-west (EW) directions. We here consider the human migration process…
▽ More
Migration is a key mechanism for expansion of communities. In spatially heterogeneous environments, rapidly gaining knowledge about the local environment is key to the evolutionary success of a migrating population. For historical human migration, environmental heterogeneity was naturally asymmetric in the north-south (NS) and east-west (EW) directions. We here consider the human migration process in the Americas, modeled as random, asymmetric, modularly correlated environments. Knowledge about the environments determines the fitness of each individual. We present a phase diagram for asymmetry of migration as a function of carrying capacity and fitness threshold. We find that the speed of migration is proportional to the inverse complement of the spatial environmental gradient, and in particular we find that north-south migration rates are lower than east-west migration rates when the environmental gradient is higher in the north-south direction. Communication of knowledge between individuals can help to spread beneficial knowledge within the population. The speed of migration increases when communication transmits pieces of knowledge that contribute in a modular way to the fitness of individuals. The results for the dependence of migration rate on asymmetry and modularity are consistent with existing archaeological observations. The results for asymmetry of genetic divergence are consistent with patterns of human gene flow.
△ Less
Submitted 6 December, 2016;
originally announced December 2016.
-
Core-genome scaffold comparison reveals the prevalence that inversion events are associated with pairs of inverted repeats
Authors:
Dan Wang,
Shuaicheng Li,
Fei Guo,
Lusheng Wang
Abstract:
Motivation: Genome rearrangement plays an important role in evolutionary biology and has profound impacts on phenotype in organisms ranging from microbes to humans. The mechanisms for genome rearrangement events remain unclear. Lots of comparisons have been conducted among different species. To reveal the mechanisms for rearrangement events, comparison of different individuals/strains within the s…
▽ More
Motivation: Genome rearrangement plays an important role in evolutionary biology and has profound impacts on phenotype in organisms ranging from microbes to humans. The mechanisms for genome rearrangement events remain unclear. Lots of comparisons have been conducted among different species. To reveal the mechanisms for rearrangement events, comparison of different individuals/strains within the same species or genus (pan-genomes) is more helpful since they are much closer to each other. Results: We study the mechanism for inversion events via core-genome scaffold comparison of different strains within the same species. We focus on two kinds of bacteria, Pseudomonas aeruginosa and Escherichia coli, and investigate the inversion events among different strains of the same specie. We find an interesting phenomenon that long (larger than 10,000 bp) inversion regions are flanked by a pair of Inverted Repeats (IRs) (with lengths ranging from 385 bp to 27476 bp) which are often Insertion Sequences (ISs).This mechanism can also explain why the breakpoint reuses for inversion events happen. We study the prevalence of the phenomenon and find that it is a major mechanism for inversions. The other observation is that for different rearrangement events such as transposition and inverted block interchange, the two ends of the swapped regions are also associated with repeats so that after the rearrangement operations the two ends of the swapped regions remain unchanged. To our knowledge, this is the first time such a phenomenon is reported for transposition event.
△ Less
Submitted 8 August, 2016;
originally announced August 2016.
-
Deep Learning for Identifying Metastatic Breast Cancer
Authors:
Dayong Wang,
Aditya Khosla,
Rishab Gargeya,
Humayun Irshad,
Andrew H. Beck
Abstract:
The International Symposium on Biomedical Imaging (ISBI) held a grand challenge to evaluate computational systems for the automated detection of metastatic breast cancer in whole slide images of sentinel lymph node biopsies. Our team won both competitions in the grand challenge, obtaining an area under the receiver operating curve (AUC) of 0.925 for the task of whole slide image classification and…
▽ More
The International Symposium on Biomedical Imaging (ISBI) held a grand challenge to evaluate computational systems for the automated detection of metastatic breast cancer in whole slide images of sentinel lymph node biopsies. Our team won both competitions in the grand challenge, obtaining an area under the receiver operating curve (AUC) of 0.925 for the task of whole slide image classification and a score of 0.7051 for the tumor localization task. A pathologist independently reviewed the same images, obtaining a whole slide image classification AUC of 0.966 and a tumor localization score of 0.733. Combining our deep learning system's predictions with the human pathologist's diagnoses increased the pathologist's AUC to 0.995, representing an approximately 85 percent reduction in human error rate. These results demonstrate the power of using deep learning to produce significant improvements in the accuracy of pathological diagnoses.
△ Less
Submitted 18 June, 2016;
originally announced June 2016.
-
Modularity Enhances the Rate of Evolution in a Rugged Fitness Landscape
Authors:
Jeong-Man Park,
Man Chen,
Dong Wang,
Michael W. Deem
Abstract:
Biological systems are modular, and this modularity affects the evolution of biological systems over time and in different environments. We here develop a theory for the dynamics of evolution in a rugged, modular fitness landscape. We show analytically how horizontal gene transfer couples to the modularity in the system and leads to more rapid rates of evolution at short times. The model, in gener…
▽ More
Biological systems are modular, and this modularity affects the evolution of biological systems over time and in different environments. We here develop a theory for the dynamics of evolution in a rugged, modular fitness landscape. We show analytically how horizontal gene transfer couples to the modularity in the system and leads to more rapid rates of evolution at short times. The model, in general, analytically demonstrates a selective pressure for the prevalence of modularity in biology. We use this model to show how the evolution of the influenza virus is affected by the modularity of the proteins that are recognized by the human immune system. Approximately 25\% of the observed rate of fitness increase of the virus could be ascribed to a modular viral landscape.
△ Less
Submitted 19 January, 2015;
originally announced January 2015.
-
Methods for scoring the collective effect of SNPs: Minor alleles of common SNPs quantitatively affect traits/diseases and are under both positive and negative selection
Authors:
Dejian Yuan,
Zuobin Zhu,
Xiaohua Tan,
Jie Liang,
Ceng Zeng,
Jiegen Zhang,
Jun Chen,
Long Ma,
Ayca Dogan,
Gudrun Brockmann,
Oliver Goldmann,
Eva Medina,
Amanda D. Rice,
Richard W. Moyer,
Xian Man,
Ke Yi,
Yanke Li,
Qing Lu,
Yimin Huang,
Dapeng Wang,
Jun Yu,
Hui Guo,
Kun Xia,
Shi Huang
Abstract:
Most common SNPs are popularly assumed to be neutral. We here developed novel methods to examine in animal models and humans whether extreme amount of minor alleles (MAs) carried by an individual may represent extreme trait values and common diseases. We analyzed panels of genetic reference populations and identified the MAs in each panel and the MA content (MAC) that each strain carried. We also…
▽ More
Most common SNPs are popularly assumed to be neutral. We here developed novel methods to examine in animal models and humans whether extreme amount of minor alleles (MAs) carried by an individual may represent extreme trait values and common diseases. We analyzed panels of genetic reference populations and identified the MAs in each panel and the MA content (MAC) that each strain carried. We also analyzed 21 published GWAS datasets of human diseases and identified the MAC of each case or control. MAC was nearly linearly linked to quantitative variations in numerous traits in model organisms, including life span, tumor susceptibility, learning and memory, sensitivity to alcohol and anti-psychotic drugs, and two correlated traits poor reproductive fitness and strong immunity. Similarly, in Europeans or European Americans, enrichment of MAs of fast but not slow evolutionary rate was linked to autoimmune and numerous other diseases, including type 2 diabetes, Parkinson's disease, psychiatric disorders, alcohol and cocaine addictions, cancer, and less life span. Therefore, both high and low MAC correlated with extreme values in many traits, indicating stabilizing selection on most MAs. The methods here are broadly applicable and may help solve the missing heritability problem in complex traits and diseases.
△ Less
Submitted 15 July, 2013; v1 submitted 12 September, 2012;
originally announced September 2012.
-
Species Diversity in Rock-Paper-Scissors Game Coupling with Levy Flight
Authors:
Dong Wang,
Qian Zhuang,
Jing Zhang,
Zengru Di
Abstract:
Rock-paper-scissors (RPS) game is a nice model to study the biodiversity in ecosystem. However, the previous studies only consider the nearest- neighbor- interaction among the species. In this paper, taking the long range migration into account, the effects of the interplay between nearest-neighbor-interaction and long-range-interaction of Levy flight obey the power law distance distribution with…
▽ More
Rock-paper-scissors (RPS) game is a nice model to study the biodiversity in ecosystem. However, the previous studies only consider the nearest- neighbor- interaction among the species. In this paper, taking the long range migration into account, the effects of the interplay between nearest-neighbor-interaction and long-range-interaction of Levy flight obey the power law distance distribution with the exponent h (-0.3<h<-0.1) in spatial RPS game is investigated. Taking the probability of long range Levy flight and the power exponent as parameters, the coexistence conditions of three species are found. The critical curves for stable coexistence of three species in the parameters space are presented. It is also found that long-range-interaction with Levy flight has interesting effects on the final spatiotemporal pattern of the system. The results reveal that the long-range-interaction of Levy flight exhibit pronounced effects on biodiversity of ecosystem.
△ Less
Submitted 14 February, 2012;
originally announced February 2012.
-
N-tuple Zipf Analysis and Modeling for Language, Computer Program and DNA
Authors:
Xiaocong Gan,
Dahui Wang,
Zhangang Han
Abstract:
n-tuple power law widely exists in language, computer program code, DNA and music. After a vast amount of Zipf analyses of n-tuple power law from empirical data, we propose a model to explain the n-tuple power law feature existed in these information translational carriers. Our model is a preferential selection approach inspired by Simon's model which explained scaling law of single symbol in a…
▽ More
n-tuple power law widely exists in language, computer program code, DNA and music. After a vast amount of Zipf analyses of n-tuple power law from empirical data, we propose a model to explain the n-tuple power law feature existed in these information translational carriers. Our model is a preferential selection approach inspired by Simon's model which explained scaling law of single symbol in a sequence Zipf analysis. The kernel mechanism is neat and simple in our model. It can be simply described as a randomly copy and paste process, that is, randomly select a random segment from current sequence and attach it to the end repeatedly. The simulation of our model shows that n-tuple power law exists in model generated data. Furthermore, two estimation equations: the Zipf exponent and the minimal length of n-tuple for power law appears all correspond to empirical data well. Our model can also reproduce the symmetry breaking process of ATGC number differences in DNA data.
△ Less
Submitted 4 August, 2009;
originally announced August 2009.
-
Discontinuities at the DNA supercoiling transition
Authors:
Bryan C. Daniels,
Scott Forth,
Maxim Y. Sheinin,
Michelle D. Wang,
James P. Sethna
Abstract:
While slowly turning the ends of a single molecule of DNA at constant applied force, a discontinuity was recently observed at the supercoiling transition, when a small plectoneme is suddenly formed. This can be understood as an abrupt transition into a state in which stretched and plectonemic DNA coexist. We argue that there should be discontinuities in both the extension and the torque at the t…
▽ More
While slowly turning the ends of a single molecule of DNA at constant applied force, a discontinuity was recently observed at the supercoiling transition, when a small plectoneme is suddenly formed. This can be understood as an abrupt transition into a state in which stretched and plectonemic DNA coexist. We argue that there should be discontinuities in both the extension and the torque at the transition, and provide experimental evidence for both. To predict the sizes of these discontinuities and how they change with the overall length of DNA, we organize a theory for the coexisting plectonemic state in terms of four length-independent parameters. We also test plectoneme theories, including our own elastic rod simulation, finding discrepancies with experiment that can be understood in terms of the four coexisting state parameters.
△ Less
Submitted 21 July, 2009; v1 submitted 21 November, 2008;
originally announced November 2008.
-
A Pattern Discovery-Based Method for Detecting Multi-Locus Genetic Association
Authors:
Zhong Li,
Aris Floratos,
David Wang,
Andrea Califano
Abstract:
Methods to effectively detect multi-locus genetic association are becoming increasingly relevant in the genetic dissection of complex trait in humans. Current approaches typically consider a limited number of hypotheses, most of which are related to the effect of a single locus or of a relatively small number of neighboring loci on a chromosomal region. We have developed a novel method that is s…
▽ More
Methods to effectively detect multi-locus genetic association are becoming increasingly relevant in the genetic dissection of complex trait in humans. Current approaches typically consider a limited number of hypotheses, most of which are related to the effect of a single locus or of a relatively small number of neighboring loci on a chromosomal region. We have developed a novel method that is specifically designed to detect genetic association involving multiple disease-susceptibility loci, possibly on different chromosomes. Our approach relies on the efficient discovery of patterns comprising spatially unrestricted polymorphic markers and on the use of appropriate test statistics to evaluate pattern-trait association. Power calculations using multi-locus disease models demonstrate significant gain of power by using this method in detecting multi-locus genetic association when compared to a standard single marker analysis method. When analyzing a Schizophrenia dataset, we confirmed a previously identified gene-gene interaction. In addition, a less conspicuous association involving different markers on the same two genes was also identified, implicating genetic heterogeneity.
△ Less
Submitted 16 March, 2007;
originally announced March 2007.
-
Synchronous phase clustering in a network of neurons with spatially decaying excitatory coupling
Authors:
Yuqing Wang,
Z. D. Wang,
Y. -X. Li,
X. Pei
Abstract:
Synchronization is studied in a spatially-distributed network of weekly-coupled, excitatory neurons of Hodgkin-Huxley type. All neurons are coupled to each other synaptically with a fixed time delay and a coupling strength inversely proportional to the distance between two neurons. We found that a robust, noise-resistant phase clustering state occurred regardless of the initial phase distributio…
▽ More
Synchronization is studied in a spatially-distributed network of weekly-coupled, excitatory neurons of Hodgkin-Huxley type. All neurons are coupled to each other synaptically with a fixed time delay and a coupling strength inversely proportional to the distance between two neurons. We found that a robust, noise-resistant phase clustering state occurred regardless of the initial phase distribution. This has not been shown in previous studies where similar clustering states were found only when the coupling was inhibitory. The spatial distribution of neurons in each synchronous cluster is determined by the spatial distribution of the coupling strength. Phase-interaction properties of the model neurons in the network are used to explain why can such a clustering state be robust.
△ Less
Submitted 18 September, 2001;
originally announced September 2001.
-
Coherence Resonance and Noise-Induced Synchronization in Globally Coupled Hodgkin-Huxley Neurons
Authors:
Yuqing Wang,
David T. W. Chik,
Z. D. Wang
Abstract:
The coherence resonance (CR) of globally coupled Hodgkin-Huxley neurons is studied. When the neurons are set in the subthreshold regime near the firing threshold, the additive noise induces limit cycles. The coherence of the system is optimized by the noise. A bell-shaped curve is found for the peak height of power spectra of the spike train, being significantly different from a monotonic behavi…
▽ More
The coherence resonance (CR) of globally coupled Hodgkin-Huxley neurons is studied. When the neurons are set in the subthreshold regime near the firing threshold, the additive noise induces limit cycles. The coherence of the system is optimized by the noise. A bell-shaped curve is found for the peak height of power spectra of the spike train, being significantly different from a monotonic behavior for the single neuron. The coupling of the network can enhance CR in two different ways. In particular, when the coupling is strong enough, the synchronization of the system is induced and optimized by the noise. This synchronization leads to a high and wide plateau in the local measure of coherence curve. The local-noise-induced limit cycle can evolve to a refined spatiotemporal order through the dynamical optimization among the autonomous oscillation of an individual neuron, the coupling of the network, and the local noise.
△ Less
Submitted 21 November, 1999;
originally announced November 1999.