Skip to main content

Showing 1–50 of 145 results for author: Li, S

  1. arXiv:2407.12053  [pdf, other

    cs.LG cs.AI q-bio.QM

    Improving AlphaFlow for Efficient Protein Ensembles Generation

    Authors: Shaoning Li, Mingyu Li, Yusong Wang, Xinheng He, Nanning Zheng, Jian Zhang, Pheng-Ann Heng

    Abstract: Investigating conformational landscapes of proteins is a crucial way to understand their biological functions and properties. AlphaFlow stands out as a sequence-conditioned generative model that introduces flexibility into structure prediction models by fine-tuning AlphaFold under the flow-matching framework. Despite the advantages of efficient sampling afforded by flow-matching, AlphaFlow still r… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: Accepted by ICML 2024 AI4Science workshop

  2. arXiv:2407.04232  [pdf

    q-bio.QM physics.bio-ph q-bio.BM q-bio.SC

    A Unified Intracellular pH Landscape with SITE-pHorin: a Quantum-Entanglement-Enhanced pH Probe

    Authors: Shu-Ang Li, Xiao-Yan Meng, Su Zhang, Ying-Jie Zhang, Run-Zhou Yang, Dian-Dian Wang, Yang Yang, Pei-Pei Liu, Jian-Sheng Kang

    Abstract: An accurate map of intracellular organelle pH is crucial for comprehending cellular metabolism and organellar functions. However, a unified intracellular pH spectrum using a single probe is still lack. Here, we developed a novel quantum entanglement-enhanced pH-sensitive probe called SITE-pHorin, which featured a wide pH-sensitive range and ratiometric quantitative measurement capabilities. Subseq… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: 64 pages, 7 figures, the supplemental material contains 13 supplemental figures and 4 supplemental tables

  3. arXiv:2407.00050  [pdf, other

    q-bio.BM cs.AI cs.LG

    FoldToken2: Learning compact, invariant and generative protein structure language

    Authors: Zhangyang Gao, Cheng Tan, Stan Z. Li

    Abstract: The equivalent nature of 3D coordinates has posed long term challenges in protein structure representation learning, alignment, and generation. Can we create a compact and invariant language that equivalently represents protein structures? Towards this goal, we propose FoldToken2 to transfer equivariant structures into discrete tokens, while maintaining the recoverability of the original structure… ▽ More

    Submitted 11 June, 2024; originally announced July 2024.

  4. arXiv:2406.11906  [pdf, other

    q-bio.QM cs.AI

    NovoBench: Benchmarking Deep Learning-based De Novo Peptide Sequencing Methods in Proteomics

    Authors: Jingbo Zhou, Shaorong Chen, Jun Xia, Sizhe Liu, Tianze Ling, Wenjie Du, Yue Liu, Jianwei Yin, Stan Z. Li

    Abstract: Tandem mass spectrometry has played a pivotal role in advancing proteomics, enabling the high-throughput analysis of protein composition in biological tissues. Many deep learning methods have been developed for \emph{de novo} peptide sequencing task, i.e., predicting the peptide sequence for the observed mass spectrum. However, two key challenges seriously hinder the further advancement of this im… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  5. arXiv:2406.10840  [pdf, other

    cs.LG cs.AI q-bio.BM

    CBGBench: Fill in the Blank of Protein-Molecule Complex Binding Graph

    Authors: Haitao Lin, Guojiang Zhao, Odin Zhang, Yufei Huang, Lirong Wu, Zicheng Liu, Siyuan Li, Cheng Tan, Zhifeng Gao, Stan Z. Li

    Abstract: Structure-based drug design (SBDD) aims to generate potential drugs that can bind to a target protein and is greatly expedited by the aid of AI techniques in generative models. However, a lack of systematic understanding persists due to the diverse settings, complex implementation, difficult reproducibility, and task singularity. Firstly, the absence of standardization can lead to unfair compariso… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 9 pages main context

  6. arXiv:2406.01627  [pdf, other

    q-bio.GN cs.LG

    GenBench: A Benchmarking Suite for Systematic Evaluation of Genomic Foundation Models

    Authors: Zicheng Liu, Jiahui Li, Siyuan Li, Zelin Zang, Cheng Tan, Yufei Huang, Yajing Bai, Stan Z. Li

    Abstract: The Genomic Foundation Model (GFM) paradigm is expected to facilitate the extraction of generalizable representations from massive genomic data, thereby enabling their application across a spectrum of downstream applications. Despite advancements, a lack of evaluation framework makes it difficult to ensure equitable assessment due to experimental settings, model intricacy, benchmark datasets, and… ▽ More

    Submitted 5 June, 2024; v1 submitted 1 June, 2024; originally announced June 2024.

  7. arXiv:2405.18968  [pdf, other

    cs.AI cs.LG q-bio.QM

    UniIF: Unified Molecule Inverse Folding

    Authors: Zhangyang Gao, Jue Wang, Cheng Tan, Lirong Wu, Yufei Huang, Siyuan Li, Zhirui Ye, Stan Z. Li

    Abstract: Molecule inverse folding has been a long-standing challenge in chemistry and biology, with the potential to revolutionize drug discovery and material science. Despite specified models have been proposed for different small- or macro-molecules, few have attempted to unify the learning process, resulting in redundant efforts. Complementary to recent advancements in molecular structure prediction, su… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  8. arXiv:2405.14225  [pdf, other

    q-bio.QM cs.CL cs.MM

    ReactXT: Understanding Molecular "Reaction-ship" via Reaction-Contextualized Molecule-Text Pretraining

    Authors: Zhiyuan Liu, Yaorui Shi, An Zhang, Sihang Li, Enzhi Zhang, Xiang Wang, Kenji Kawaguchi, Tat-Seng Chua

    Abstract: Molecule-text modeling, which aims to facilitate molecule-relevant tasks with a textual interface and textual knowledge, is an emerging research direction. Beyond single molecules, studying reaction-text modeling holds promise for helping the synthesis of new materials and drugs. However, previous works mostly neglect reaction-text modeling: they primarily focus on modeling individual molecule-tex… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: ACL 2024 Findings, 9 pages

  9. arXiv:2405.10812  [pdf, other

    q-bio.GN cs.AI

    VQDNA: Unleashing the Power of Vector Quantization for Multi-Species Genomic Sequence Modeling

    Authors: Siyuan Li, Zedong Wang, Zicheng Liu, Di Wu, Cheng Tan, Jiangbin Zheng, Yufei Huang, Stan Z. Li

    Abstract: Similar to natural language models, pre-trained genome language models are proposed to capture the underlying intricacies within genomes with unsupervised sequence modeling. They have become essential tools for researchers and practitioners in biology. However, the hand-crafted tokenization policies used in these models may not encode the most discriminative patterns from the limited vocabulary of… ▽ More

    Submitted 2 June, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

    Comments: ICML 2024. Preprint V2 with 17 pages and 5 figures

  10. arXiv:2405.10348  [pdf, other

    q-bio.QM cs.AI cs.LG

    Learning to Predict Mutation Effects of Protein-Protein Interactions by Microenvironment-aware Hierarchical Prompt Learning

    Authors: Lirong Wu, Yijun Tian, Haitao Lin, Yufei Huang, Siyuan Li, Nitesh V Chawla, Stan Z. Li

    Abstract: Protein-protein bindings play a key role in a variety of fundamental biological processes, and thus predicting the effects of amino acid mutations on protein-protein binding is crucial. To tackle the scarcity of annotated mutation data, pre-training with massive unlabeled data has emerged as a promising solution. However, this process faces a series of challenges: (1) complex higher-order dependen… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

  11. arXiv:2405.06642  [pdf, other

    q-bio.BM cs.AI cs.LG

    PPFlow: Target-aware Peptide Design with Torsional Flow Matching

    Authors: Haitao Lin, Odin Zhang, Huifeng Zhao, Dejun Jiang, Lirong Wu, Zicheng Liu, Yufei Huang, Stan Z. Li

    Abstract: Therapeutic peptides have proven to have great pharmaceutical value and potential in recent decades. However, methods of AI-assisted peptide drug discovery are not fully explored. To fill the gap, we propose a target-aware peptide design method called \textsc{PPFlow}, based on conditional flow matching on torus manifolds, to model the internal geometries of torsion angles for the peptide structure… ▽ More

    Submitted 16 June, 2024; v1 submitted 5 March, 2024; originally announced May 2024.

    Comments: 18 pages

  12. arXiv:2405.00751  [pdf, other

    q-bio.QM cs.AI cs.LG

    F$^3$low: Frame-to-Frame Coarse-grained Molecular Dynamics with SE(3) Guided Flow Matching

    Authors: Shaoning Li, Yusong Wang, Mingyu Li, Jian Zhang, Bin Shao, Nanning Zheng, Jian Tang

    Abstract: Molecular dynamics (MD) is a crucial technique for simulating biological systems, enabling the exploration of their dynamic nature and fostering an understanding of their functions and properties. To address exploration inefficiency, emerging enhanced sampling approaches like coarse-graining (CG) and generative models have been employed. In this work, we propose a \underline{Frame-to-Frame} genera… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: Accepted by ICLR 2024 GEM workshop

  13. arXiv:2404.16866  [pdf, other

    q-bio.QM cs.AI cs.LG

    Functional Protein Design with Local Domain Alignment

    Authors: Chaohao Yuan, Songyou Li, Geyan Ye, Yikun Zhang, Long-Kai Huang, Wenbing Huang, Wei Liu, Jianhua Yao, Yu Rong

    Abstract: The core challenge of de novo protein design lies in creating proteins with specific functions or properties, guided by certain conditions. Current models explore to generate protein using structural and evolutionary guidance, which only provide indirect conditions concerning functions and properties. However, textual annotations of proteins, especially the annotations for protein domains, which d… ▽ More

    Submitted 27 May, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

  14. arXiv:2404.05329  [pdf

    q-bio.BM

    In silico bioactivity prediction of proteins interacting with graphene-based nanomaterials guides rational design of biosensor

    Authors: Jing Ye, Minzhi Fan, Xiaoyu Zhang, Shasha Lu, Mengyao Chai, Yunshan Zhang, Xiaoyu Zhao, Shuang Li, Diming Zhang

    Abstract: Graphene based nanomaterials have attracted significant attention for their potentials in biomedical and biotechnology applications in recent years, owing to the outstanding physical and chemical properties. However, the interaction mechanism and impact on biological activity of macro and micro biomolecules still require more concerns and further research in order to enhance their applicability in… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  15. arXiv:2403.09673  [pdf, other

    q-bio.BM cs.AI cs.LG

    FoldToken: Learning Protein Language via Vector Quantization and Beyond

    Authors: Zhangyang Gao, Cheng Tan, Jue Wang, Yufei Huang, Lirong Wu, Stan Z. Li

    Abstract: Is there a foreign language describing protein sequences and structures simultaneously? Protein structures, represented by continuous 3D points, have long posed a challenge due to the contrasting modeling paradigms of discrete sequences. We introduce \textbf{FoldTokenizer} to represent protein sequence-structure as discrete symbols. This innovative approach involves projecting residue types and st… ▽ More

    Submitted 19 March, 2024; v1 submitted 4 February, 2024; originally announced March 2024.

  16. arXiv:2403.07721  [pdf, other

    cs.HC eess.SP q-bio.NC

    Visual Decoding and Reconstruction via EEG Embeddings with Guided Diffusion

    Authors: Dongyang Li, Chen Wei, Shiying Li, Jiachen Zou, Quanying Liu

    Abstract: How to decode human vision through neural signals has attracted a long-standing interest in neuroscience and machine learning. Modern contrastive learning and generative models improved the performance of fMRI-based visual decoding and reconstruction. However, the high cost and low temporal resolution of fMRI limit their applications in brain-computer interfaces (BCIs), prompting a high need for E… ▽ More

    Submitted 4 April, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

  17. arXiv:2403.07013  [pdf, other

    q-bio.QM cs.LG q-bio.BM

    AdaNovo: Adaptive \emph{De Novo} Peptide Sequencing with Conditional Mutual Information

    Authors: Jun Xia, Shaorong Chen, Jingbo Zhou, Tianze Ling, Wenjie Du, Sizhe Liu, Stan Z. Li

    Abstract: Tandem mass spectrometry has played a pivotal role in advancing proteomics, enabling the analysis of protein composition in biological samples. Despite the development of various deep learning methods for identifying amino acid sequences (peptides) responsible for observed spectra, challenges persist in \emph{de novo} peptide sequencing. Firstly, prior methods struggle to identify amino acids with… ▽ More

    Submitted 15 March, 2024; v1 submitted 9 March, 2024; originally announced March 2024.

  18. arXiv:2403.05314  [pdf, other

    q-bio.BM

    Advances of Deep Learning in Protein Science: A Comprehensive Survey

    Authors: Bozhen Hu, Cheng Tan, Lirong Wu, Jiangbin Zheng, Jun Xia, Zhangyang Gao, Zicheng Liu, Fandi Wu, Guijun Zhang, Stan Z. Li

    Abstract: Protein representation learning plays a crucial role in understanding the structure and function of proteins, which are essential biomolecules involved in various biological processes. In recent years, deep learning has emerged as a powerful tool for protein modeling due to its ability to learn complex patterns and representations from large-scale protein data. This comprehensive survey aims to pr… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

  19. arXiv:2403.00875  [pdf, other

    q-bio.QM cs.AI cs.LG q-bio.BM

    Enhancing Protein Predictive Models via Proteins Data Augmentation: A Benchmark and New Directions

    Authors: Rui Sun, Lirong Wu, Haitao Lin, Yufei Huang, Stan Z. Li

    Abstract: Augmentation is an effective alternative to utilize the small amount of labeled protein data. However, most of the existing work focuses on design-ing new architectures or pre-training tasks, and relatively little work has studied data augmentation for proteins. This paper extends data augmentation techniques previously used for images and texts to proteins and then benchmarks these techniques on… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

  20. arXiv:2402.16901  [pdf, other

    q-bio.GN cs.AI cs.LG

    FGBERT: Function-Driven Pre-trained Gene Language Model for Metagenomics

    Authors: ChenRui Duan, Zelin Zang, Yongjie Xu, Hang He, Zihan Liu, Zijia Song, Ju-Sheng Zheng, Stan Z. Li

    Abstract: Metagenomic data, comprising mixed multi-species genomes, are prevalent in diverse environments like oceans and soils, significantly impacting human health and ecological functions. However, current research relies on K-mer representations, limiting the capture of structurally relevant gene contexts. To address these limitations and further our understanding of complex relationships between metage… ▽ More

    Submitted 24 February, 2024; originally announced February 2024.

  21. arXiv:2402.14391  [pdf, other

    cs.LG q-bio.BM

    MAPE-PPI: Towards Effective and Efficient Protein-Protein Interaction Prediction via Microenvironment-Aware Protein Embedding

    Authors: Lirong Wu, Yijun Tian, Yufei Huang, Siyuan Li, Haitao Lin, Nitesh V Chawla, Stan Z. Li

    Abstract: Protein-Protein Interactions (PPIs) are fundamental in various biological processes and play a key role in life activities. The growing demand and cost of experimental PPI assays require computational methods for efficient PPI prediction. While existing methods rely heavily on protein sequence for PPI prediction, it is the protein structure that is the key to determine the interactions. To take bo… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

  22. arXiv:2402.11459  [pdf, other

    q-bio.BM cs.AI cs.LG physics.chem-ph

    Re-Dock: Towards Flexible and Realistic Molecular Docking with Diffusion Bridge

    Authors: Yufei Huang, Odin Zhang, Lirong Wu, Cheng Tan, Haitao Lin, Zhangyang Gao, Siyuan Li, Stan. Z. Li

    Abstract: Accurate prediction of protein-ligand binding structures, a task known as molecular docking is crucial for drug design but remains challenging. While deep learning has shown promise, existing methods often depend on holo-protein structures (docked, and not accessible in realistic tasks) or neglect pocket sidechain conformations, leading to limited practical utility and unrealistic conformation pre… ▽ More

    Submitted 21 February, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

  23. arXiv:2402.09416  [pdf, other

    q-bio.BM cs.LG

    Deep Manifold Transformation for Protein Representation Learning

    Authors: Bozhen Hu, Zelin Zang, Cheng Tan, Stan Z. Li

    Abstract: Protein representation learning is critical in various tasks in biology, such as drug design and protein structure or function prediction, which has primarily benefited from protein language models and graph neural networks. These models can capture intrinsic patterns from protein sequences and structures through masking and task-related losses. However, the learned protein representations are usu… ▽ More

    Submitted 12 January, 2024; originally announced February 2024.

    Comments: This work has been accepted by ICASSP 2024

  24. arXiv:2402.08198  [pdf, other

    q-bio.BM cs.AI cs.LG

    PSC-CPI: Multi-Scale Protein Sequence-Structure Contrasting for Efficient and Generalizable Compound-Protein Interaction Prediction

    Authors: Lirong Wu, Yufei Huang, Cheng Tan, Zhangyang Gao, Bozhen Hu, Haitao Lin, Zicheng Liu, Stan Z. Li

    Abstract: Compound-Protein Interaction (CPI) prediction aims to predict the pattern and strength of compound-protein interactions for rational drug discovery. Existing deep learning-based methods utilize only the single modality of protein sequences or structures and lack the co-modeling of the joint distribution of the two modalities, which may lead to significant performance drops in complex real-world sc… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

  25. arXiv:2402.03781  [pdf, other

    q-bio.QM cs.AI cs.LG

    MolTC: Towards Molecular Relational Modeling In Language Models

    Authors: Junfeng Fang, Shuai Zhang, Chang Wu, Zhengyi Yang, Zhiyuan Liu, Sihang Li, Kun Wang, Wenjie Du, Xiang Wang

    Abstract: Molecular Relational Learning (MRL), aiming to understand interactions between molecular pairs, plays a pivotal role in advancing biochemical research. Recently, the adoption of large language models (LLMs), known for their vast knowledge repositories and advanced logical inference capabilities, has emerged as a promising way for efficient and effective MRL. Despite their potential, these methods… ▽ More

    Submitted 10 June, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

    Comments: ACL 2024

  26. arXiv:2401.13923  [pdf, other

    cs.LG cs.IR q-bio.BM

    Towards 3D Molecule-Text Interpretation in Language Models

    Authors: Sihang Li, Zhiyuan Liu, Yanchen Luo, Xiang Wang, Xiangnan He, Kenji Kawaguchi, Tat-Seng Chua, Qi Tian

    Abstract: Language Models (LMs) have greatly influenced diverse domains. However, their inherent limitation in comprehending 3D molecular structures has considerably constrained their potential in the biomolecular domain. To bridge this gap, we focus on 3D molecule-text interpretation, and propose 3D-MoLM: 3D-Molecular Language Modeling. Specifically, 3D-MoLM enables an LM to interpret and analyze 3D molecu… ▽ More

    Submitted 17 March, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

  27. arXiv:2401.06199  [pdf, other

    q-bio.QM cs.AI cs.LG

    xTrimoPGLM: Unified 100B-Scale Pre-trained Transformer for Deciphering the Language of Protein

    Authors: Bo Chen, Xingyi Cheng, Pan Li, Yangli-ao Geng, Jing Gong, Shen Li, Zhilei Bei, Xu Tan, Boyan Wang, Xin Zeng, Chiming Liu, Aohan Zeng, Yuxiao Dong, Jie Tang, Le Song

    Abstract: Protein language models have shown remarkable success in learning biological information from protein sequences. However, most existing models are limited by either autoencoding or autoregressive pre-training objectives, which makes them struggle to handle protein understanding and generation tasks concurrently. We propose a unified protein language model, xTrimoPGLM, to address these two types of… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

  28. arXiv:2401.02713  [pdf, other

    cs.LG cs.AI q-bio.BM

    Graph-level Protein Representation Learning by Structure Knowledge Refinement

    Authors: Ge Wang, Zelin Zang, Jiangbin Zheng, Jun Xia, Stan Z. Li

    Abstract: This paper focuses on learning representation on the whole graph level in an unsupervised manner. Learning graph-level representation plays an important role in a variety of real-world issues such as molecule property prediction, protein structure feature extraction, and social network analysis. The mainstream method is utilizing contrastive learning to facilitate graph feature extraction, known a… ▽ More

    Submitted 5 January, 2024; originally announced January 2024.

  29. arXiv:2312.14220  [pdf, other

    q-bio.GN cs.AI cs.LG

    Single-Cell RNA-seq Synthesis with Latent Diffusion Model

    Authors: Yixuan Wang, Shuangyin Li, Shimin DI, Lei Chen

    Abstract: The single-cell RNA sequencing (scRNA-seq) technology enables researchers to study complex biological systems and diseases with high resolution. The central challenge is synthesizing enough scRNA-seq samples; insufficient samples can impede downstream analysis and reproducibility. While various methods have been attempted in past research, the resulting scRNA-seq samples were often of poor quality… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

    Comments: 13 pages, 5 figures

  30. arXiv:2312.06082  [pdf, other

    cs.AI q-bio.QM

    XAI meets Biology: A Comprehensive Review of Explainable AI in Bioinformatics Applications

    Authors: Zhongliang Zhou, Mengxuan Hu, Mariah Salcedo, Nathan Gravel, Wayland Yeung, Aarya Venkat, Dongliang Guo, Jielu Zhang, Natarajan Kannan, Sheng Li

    Abstract: Artificial intelligence (AI), particularly machine learning and deep learning models, has significantly impacted bioinformatics research by offering powerful tools for analyzing complex biological data. However, the lack of interpretability and transparency of these models presents challenges in leveraging these models for deeper biological insights and for generating testable hypotheses. Explaina… ▽ More

    Submitted 10 December, 2023; originally announced December 2023.

    Comments: 19 pages, 9 figures

  31. arXiv:2312.04019  [pdf, other

    q-bio.BM cs.AI

    Efficiently Predicting Protein Stability Changes Upon Single-point Mutation with Large Language Models

    Authors: Yijie Zhang, Zhangyang Gao, Cheng Tan, Stan Z. Li

    Abstract: Predicting protein stability changes induced by single-point mutations has been a persistent challenge over the years, attracting immense interest from numerous researchers. The ability to precisely predict protein thermostability is pivotal for various subfields and applications in biochemistry, including drug development, protein evolution analysis, and enzyme synthesis. Despite the proposition… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

  32. arXiv:2311.16126  [pdf, other

    q-bio.BM cs.CE cs.LG

    A Hierarchical Training Paradigm for Antibody Structure-sequence Co-design

    Authors: Fang Wu, Stan Z. Li

    Abstract: Therapeutic antibodies are an essential and rapidly expanding drug modality. The binding specificity between antibodies and antigens is decided by complementarity-determining regions (CDRs) at the tips of these Y-shaped proteins. In this paper, we propose a hierarchical training paradigm (HTP) for the antibody sequence-structure co-design. HTP consists of four levels of training stages, each corre… ▽ More

    Submitted 29 October, 2023; originally announced November 2023.

  33. arXiv:2311.11004  [pdf, other

    q-bio.QM

    A Foundation Model for Cell Segmentation

    Authors: Uriah Israel, Markus Marks, Rohit Dilip, Qilin Li, Morgan Schwartz, Elora Pradhan, Edward Pao, Shenyi Li, Alexander Pearson-Goulart, Pietro Perona, Georgia Gkioxari, Ross Barnowski, Yisong Yue, David Van Valen

    Abstract: Cells are the fundamental unit of biological organization, and identifying them in imaging data - cell segmentation - is a critical task for various cellular imaging experiments. While deep learning methods have led to substantial progress on this problem, models that have seen wide use are specialist models that work well for specific domains. Methods that have learned the general notion of "what… ▽ More

    Submitted 18 November, 2023; originally announced November 2023.

  34. arXiv:2311.05106  [pdf, other

    cs.NE cs.AI q-bio.NC

    A differentiable brain simulator bridging brain simulation and brain-inspired computing

    Authors: Chaoming Wang, Tianqiu Zhang, Sichao He, Hongyaoxing Gu, Shangyang Li, Si Wu

    Abstract: Brain simulation builds dynamical models to mimic the structure and functions of the brain, while brain-inspired computing (BIC) develops intelligent systems by learning from the structure and functions of the brain. The two fields are intertwined and should share a common programming framework to facilitate each other's development. However, none of the existing software in the fields can achieve… ▽ More

    Submitted 22 February, 2024; v1 submitted 8 November, 2023; originally announced November 2023.

    Comments: 23 pages, 11 figures, ICLR 2024

  35. arXiv:2310.11466  [pdf, other

    cs.LG cs.AI q-bio.QM

    Protein 3D Graph Structure Learning for Robust Structure-based Protein Property Prediction

    Authors: Yufei Huang, Siyuan Li, Jin Su, Lirong Wu, Odin Zhang, Haitao Lin, Jingqi Qi, Zihan Liu, Zhangyang Gao, Yuyang Liu, Jiangbin Zheng, Stan. ZQ. Li

    Abstract: Protein structure-based property prediction has emerged as a promising approach for various biological tasks, such as protein function prediction and sub-cellular location estimation. The existing methods highly rely on experimental protein structure data and fail in scenarios where these data are unavailable. Predicted protein structures from AI tools (e.g., AlphaFold2) were utilized as alternati… ▽ More

    Submitted 19 October, 2023; v1 submitted 14 October, 2023; originally announced October 2023.

  36. arXiv:2310.07253  [pdf

    cs.LG cs.AI q-bio.QM

    ADMEOOD: Out-of-Distribution Benchmark for Drug Property Prediction

    Authors: Shuoying Wei, Xinlong Wen, Lida Zhu, Songquan Li, Rongbo Zhu

    Abstract: Obtaining accurate and valid information for drug molecules is a crucial and challenging task. However, chemical knowledge and information have been accumulated over the past 100 years from various regions, laboratories, and experimental purposes. Little has been explored in terms of the out-of-distribution (OOD) problem with noise and inconsistency, which may lead to weak robustness and unsatisfi… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

  37. arXiv:2309.07178  [pdf

    q-bio.QM cs.AI cs.LG eess.SP

    CloudBrain-NMR: An Intelligent Cloud Computing Platform for NMR Spectroscopy Processing, Reconstruction and Analysis

    Authors: Di Guo, Sijin Li, Jun Liu, Zhangren Tu, Tianyu Qiu, Jingjing Xu, Liubin Feng, Donghai Lin, Qing Hong, Meijin Lin, Yanqin Lin, Xiaobo Qu

    Abstract: Nuclear Magnetic Resonance (NMR) spectroscopy has served as a powerful analytical tool for studying molecular structure and dynamics in chemistry and biology. However, the processing of raw data acquired from NMR spectrometers and subsequent quantitative analysis involves various specialized tools, which necessitates comprehensive knowledge in programming and NMR. Particularly, the emerging deep l… ▽ More

    Submitted 12 September, 2023; originally announced September 2023.

    Comments: 11 pages, 13 figures

  38. arXiv:2308.06967  [pdf

    q-bio.TO

    Intestinal Microecology in Pediatric Surgery-Related Gastrointestinal Diseases Current Insights and Future Perspectives

    Authors: Yingchao Li, Yuqing Wu, Suolin Li, Lin Liu, Xiaoyi Zhang, Jiaxun Lv, Qinqin Li

    Abstract: Intestinal microecology is established from birth and is constantly changing until homeostasis is reached. Intestinal microecology is involved in the immune inflammatory response of the intestine and regulates the intestinal barrier function. The imbalance of intestinal microecology is closely related to the occurrence and development of digestive system diseases. In some gastrointestinal diseases… ▽ More

    Submitted 14 August, 2023; originally announced August 2023.

  39. arXiv:2307.09580  [pdf, other

    q-bio.BM cs.DS q-bio.GN

    LinearSankoff: Linear-time Simultaneous Folding and Alignment of RNA Homologs

    Authors: Sizhen Li, Ning Dai, He Zhang, Apoorv Malik, David H. Mathews, Liang Huang

    Abstract: The classical Sankoff algorithm for the simultaneous folding and alignment of homologous RNA sequences is highly influential, but it suffers from two major limitations in efficiency and modeling power. First, it takes $O(n^6)$ for two sequences where n is the average sequence length. Most implementations and variations reduce the runtime to $O(n^3)$ by restricting the alignment search space, but t… ▽ More

    Submitted 18 July, 2023; originally announced July 2023.

  40. arXiv:2307.09169  [pdf, ps, other

    q-bio.BM cs.LG

    Efficient Prediction of Peptide Self-assembly through Sequential and Graphical Encoding

    Authors: Zihan Liu, Jiaqi Wang, Yun Luo, Shuang Zhao, Wenbin Li, Stan Z. Li

    Abstract: In recent years, there has been an explosion of research on the application of deep learning to the prediction of various peptide properties, due to the significant development and market potential of peptides. Molecular dynamics has enabled the efficient collection of large peptide datasets, providing reliable training data for deep learning. However, the lack of systematic analysis of the peptid… ▽ More

    Submitted 16 July, 2023; originally announced July 2023.

  41. arXiv:2306.13769  [pdf, other

    q-bio.BM cs.LG

    Functional-Group-Based Diffusion for Pocket-Specific Molecule Generation and Elaboration

    Authors: Haitao Lin, Yufei Huang, Odin Zhang, Lirong Wu, Siyuan Li, Zhiyuan Chen, Stan Z. Li

    Abstract: In recent years, AI-assisted drug design methods have been proposed to generate molecules given the pockets' structures of target proteins. Most of them are atom-level-based methods, which consider atoms as basic components and generate atom positions and types. In this way, however, it is hard to generate realistic fragments with complicated structures. To solve this, we propose D3FG, a functiona… ▽ More

    Submitted 18 March, 2024; v1 submitted 30 May, 2023; originally announced June 2023.

    Comments: 9 pages

  42. arXiv:2306.07505  [pdf

    q-bio.TO eess.IV

    Deep learning radiomics for assessment of gastroesophageal varices in people with compensated advanced chronic liver disease

    Authors: Lan Wang, Ruiling He, Lili Zhao, Jia Wang, Zhengzi Geng, Tao Ren, Guo Zhang, Peng Zhang, Kaiqiang Tang, Chaofei Gao, Fei Chen, Liting Zhang, Yonghe Zhou, Xin Li, Fanbin He, Hui Huan, Wenjuan Wang, Yunxiao Liang, Juan Tang, Fang Ai, Tingyu Wang, Liyun Zheng, Zhongwei Zhao, Jiansong Ji, Wei Liu , et al. (22 additional authors not shown)

    Abstract: Objective: Bleeding from gastroesophageal varices (GEV) is a medical emergency associated with high mortality. We aim to construct an artificial intelligence-based model of two-dimensional shear wave elastography (2D-SWE) of the liver and spleen to precisely assess the risk of GEV and high-risk gastroesophageal varices (HRV). Design: A prospective multicenter study was conducted in patients with… ▽ More

    Submitted 12 June, 2023; originally announced June 2023.

  43. arXiv:2305.15153  [pdf, other

    q-bio.BM cs.AI cs.LG

    MotifRetro: Exploring the Combinability-Consistency Trade-offs in retrosynthesis via Dynamic Motif Editing

    Authors: Zhangyang Gao, Xingran Chen, Cheng Tan, Stan Z. Li

    Abstract: Is there a unified framework for graph-based retrosynthesis prediction? Through analysis of full-, semi-, and non-template retrosynthesis methods, we discovered that they strive to strike an optimal balance between combinability and consistency: \textit{Should atoms be combined as motifs to simplify the molecular editing process, or should motifs be broken down into atoms to reduce the vocabulary… ▽ More

    Submitted 20 May, 2023; originally announced May 2023.

  44. arXiv:2305.15151  [pdf, other

    q-bio.BM cs.AI cs.LG

    Knowledge-Design: Pushing the Limit of Protein Design via Knowledge Refinement

    Authors: Zhangyang Gao, Cheng Tan, Stan Z. Li

    Abstract: Recent studies have shown competitive performance in protein design that aims to find the amino acid sequence folding into the desired structure. However, most of them disregard the importance of predictive confidence, fail to cover the vast protein space, and do not incorporate common protein knowledge. After witnessing the great success of pretrained models on diverse protein-related tasks and t… ▽ More

    Submitted 29 May, 2023; v1 submitted 20 May, 2023; originally announced May 2023.

  45. arXiv:2305.12471  [pdf, other

    q-bio.NC

    Mapping Biological Neuron Dynamics into an Interpretable Two-layer Artificial Neural Network

    Authors: Jingyang Ma, Songting Li, Douglas Zhou

    Abstract: Dendrites are crucial structures for computation of an individual neuron. It has been shown that the dynamics of a biological neuron with dendrites can be approximated by artificial neural networks (ANN) with deep structure. However, it remains unclear whether a neuron can be further captured by a simple, biologically plausible ANN. In this work, we develop a two-layer ANN, named as dendritic bili… ▽ More

    Submitted 21 May, 2023; originally announced May 2023.

  46. arXiv:2305.09480  [pdf, other

    q-bio.BM cs.AI cs.LG

    Cross-Gate MLP with Protein Complex Invariant Embedding is A One-Shot Antibody Designer

    Authors: Cheng Tan, Zhangyang Gao, Lirong Wu, Jun Xia, Jiangbin Zheng, Xihong Yang, Yue Liu, Bozhen Hu, Stan Z. Li

    Abstract: Antibodies are crucial proteins produced by the immune system in response to foreign substances or antigens. The specificity of an antibody is determined by its complementarity-determining regions (CDRs), which are located in the variable domains of the antibody chains and form the antigen-binding site. Previous studies have utilized complex techniques to generate CDRs, but they suffer from inadeq… ▽ More

    Submitted 10 January, 2024; v1 submitted 21 April, 2023; originally announced May 2023.

    Comments: Accepted by AAAI 2024

  47. arXiv:2305.04931  [pdf

    q-bio.QM

    Network pharmacology on the mechanism of Yi Qi Tong Qiao Pill inhibiting allergic rhinitis

    Authors: Boyang Wang, DingFan Zhang, Tingyu Zhang, Chayanis Sutcharitchan, Jianlin Hua, Dongfang Hua, Bo Zhang, Shao Li

    Abstract: Objective: The purpose of this study is to reveal the mechanism of action of Yi Qi Tong Qiao Pill (YQTQP) in the treatment of allergic rhinitis (AR), as well as establish a paradigm for the researches on traditional Chinese medicine (TCM) from systematic perspective. Methods: Based on the data collected from TCM-related and disease-related databases, target profiles of compounds in YQTQP were calc… ▽ More

    Submitted 21 May, 2023; v1 submitted 6 May, 2023; originally announced May 2023.

    Comments: 25 pages, 6 figures

    MSC Class: None

  48. arXiv:2303.11783  [pdf, other

    q-bio.BM cs.AI cs.LG

    Lightweight Contrastive Protein Structure-Sequence Transformation

    Authors: Jiangbin Zheng, Ge Wang, Yufei Huang, Bozhen Hu, Siyuan Li, Cheng Tan, Xinwen Fan, Stan Z. Li

    Abstract: Pretrained protein structure models without labels are crucial foundations for the majority of protein downstream applications. The conventional structure pretraining methods follow the mature natural language pretraining methods such as denoised reconstruction and masked language modeling but usually destroy the real representation of spatial structures. The other common pretraining methods might… ▽ More

    Submitted 19 March, 2023; originally announced March 2023.

  49. arXiv:2302.10888  [pdf, other

    cs.LG cs.AI q-bio.BM

    Data-Efficient Protein 3D Geometric Pretraining via Refinement of Diffused Protein Structure Decoy

    Authors: Yufei Huang, Lirong Wu, Haitao Lin, Jiangbin Zheng, Ge Wang, Stan Z. Li

    Abstract: Learning meaningful protein representation is important for a variety of biological downstream tasks such as structure-based drug design. Having witnessed the success of protein sequence pretraining, pretraining for structural data which is more informative has become a promising research topic. However, there are three major challenges facing protein structure pretraining: insufficient sample div… ▽ More

    Submitted 5 February, 2023; originally announced February 2023.

  50. arXiv:2302.07120  [pdf, other

    cs.AI cs.LG q-bio.BM

    PrefixMol: Target- and Chemistry-aware Molecule Design via Prefix Embedding

    Authors: Zhangyang Gao, Yuqi Hu, Cheng Tan, Stan Z. Li

    Abstract: Is there a unified model for generating molecules considering different conditions, such as binding pockets and chemical properties? Although target-aware generative models have made significant advances in drug design, they do not consider chemistry conditions and cannot guarantee the desired chemical properties. Unfortunately, merging the target-aware and chemical-aware models into a unified mod… ▽ More

    Submitted 14 February, 2023; originally announced February 2023.