Skip to main content

Showing 1–50 of 882 results for author: Feng, Y

  1. arXiv:2407.10290  [pdf, other

    cs.AR

    Switch-Less Dragonfly on Wafers: A Scalable Interconnection Architecture based on Wafer-Scale Integration

    Authors: Yinxiao Feng, Kaisheng Ma

    Abstract: Existing high-performance computing (HPC) interconnection architectures are based on high-radix switches, which limits the injection/local performance and introduces latency/energy/cost overhead. The new wafer-scale packaging and high-speed wireline technologies provide high-density, low-latency, and high-bandwidth connectivity, thus promising to support direct-connected high-radix interconnection… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

  2. arXiv:2407.09920  [pdf, other

    cs.CV

    MutDet: Mutually Optimizing Pre-training for Remote Sensing Object Detection

    Authors: Ziyue Huang, Yongchao Feng, Qingjie Liu, Yunhong Wang

    Abstract: Detection pre-training methods for the DETR series detector have been extensively studied in natural scenes, e.g., DETReg. However, the detection pre-training remains unexplored in remote sensing scenes. In existing pre-training methods, alignment between object embeddings extracted from a pre-trained backbone and detector features is significant. However, due to differences in feature extraction… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: 14 pages, 4 figures

  3. arXiv:2407.08290  [pdf, other

    cs.CV

    Gap Completion in Point Cloud Scene occluded by Vehicles using SGC-Net

    Authors: Yu Feng, Yiming Xu, Yan Xia, Claus Brenner, Monika Sester

    Abstract: Recent advances in mobile mapping systems have greatly enhanced the efficiency and convenience of acquiring urban 3D data. These systems utilize LiDAR sensors mounted on vehicles to capture vast cityscapes. However, a significant challenge arises due to occlusions caused by roadside parked vehicles, leading to the loss of scene information, particularly on the roads, sidewalks, curbs, and the lowe… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  4. arXiv:2407.07400  [pdf

    cond-mat.mtrl-sci cs.HC physics.bio-ph

    Invisible sweat sensor: ultrathin membrane mimics skin for stress monitoring

    Authors: Yuchen Feng, Andreas Kenny Oktavius, Reno Adley Prawoto, Hing Ni Ko, Qiao Gu, Ping Gao

    Abstract: Epidermal skin sensors have emerged as a promising approach for continuous and noninvasive monitoring of vital health signals, but to maximize their performance, these sensors must integrate seamlessly with the skin, minimizing impedance while maintaining the skin's natural protective and regulatory functions.In this study, we introduce an imperceptible sweat sensor that achieves this seamless ski… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  5. arXiv:2407.06348  [pdf, other

    cs.CR cs.PL

    FORAY: Towards Effective Attack Synthesis against Deep Logical Vulnerabilities in DeFi Protocols

    Authors: Hongbo Wen, Hanzhi Liu, Jiaxin Song, Yanju Chen, Wenbo Guo, Yu Feng

    Abstract: Blockchain adoption has surged with the rise of Decentralized Finance (DeFi) applications. However, the significant value of digital assets managed by DeFi protocols makes them prime targets for attacks. Current smart contract vulnerability detection tools struggle with DeFi protocols due to deep logical bugs arising from complex financial interactions between multiple smart contracts. These tools… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  6. arXiv:2407.05283  [pdf, other

    cs.CV

    SCIPaD: Incorporating Spatial Clues into Unsupervised Pose-Depth Joint Learning

    Authors: Yi Feng, Zizhan Guo, Qijun Chen, Rui Fan

    Abstract: Unsupervised monocular depth estimation frameworks have shown promising performance in autonomous driving. However, existing solutions primarily rely on a simple convolutional neural network for ego-motion recovery, which struggles to estimate precise camera poses in dynamic, complicated real-world scenarios. These inaccurately estimated camera poses can inevitably deteriorate the photometric reco… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: Accepted by IEEE Transactions on Intelligent Vehicles. Code is available at https://mias.group/SCIPaD

  7. arXiv:2407.03994  [pdf, other

    cs.CL cs.AI

    Unlocking the Potential of Model Merging for Low-Resource Languages

    Authors: Mingxu Tao, Chen Zhang, Quzhe Huang, Tianyao Ma, Songfang Huang, Dongyan Zhao, Yansong Feng

    Abstract: Adapting large language models (LLMs) to new languages typically involves continual pre-training (CT) followed by supervised fine-tuning (SFT). However, this CT-then-SFT approach struggles with limited data in the context of low-resource languages, failing to balance language modeling and task-solving capabilities. We thus propose model merging as an alternative for low-resource languages, combini… ▽ More

    Submitted 9 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

  8. arXiv:2407.03993  [pdf, other

    cs.CL

    A Survey on Natural Language Counterfactual Generation

    Authors: Yongjie Wang, Xiaoqi Qiu, Yu Yue, Xu Guo, Zhiwei Zeng, Yuhong Feng, Zhiqi Shen

    Abstract: Natural Language Counterfactual generation aims to minimally modify a given text such that the modified text will be classified into a different class. The generated counterfactuals provide insight into the reasoning behind a model's predictions by highlighting which words significantly influence the outcomes. Additionally, they can be used to detect model fairness issues or augment the training d… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: A survey paper

    MSC Class: 68T50 ACM Class: I.2.7

  9. arXiv:2407.02043  [pdf, other

    cs.CL

    Concise and Precise Context Compression for Tool-Using Language Models

    Authors: Yang Xu, Yunlong Feng, Honglin Mu, Yutai Hou, Yitong Li, Xinghao Wang, Wanjun Zhong, Zhongyang Li, Dandan Tu, Qingfu Zhu, Min Zhang, Wanxiang Che

    Abstract: Through reading the documentation in the context, tool-using language models can dynamically extend their capability using external tools. The cost is that we have to input lengthy documentation every time the model needs to use the tool, occupying the input window as well as slowing down the decoding process. Given the progress in general-purpose compression, soft context compression is a suita… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  10. arXiv:2407.01029  [pdf, other

    cs.CV

    EndoSparse: Real-Time Sparse View Synthesis of Endoscopic Scenes using Gaussian Splatting

    Authors: Chenxin Li, Brandon Y. Feng, Yifan Liu, Hengyu Liu, Cheng Wang, Weihao Yu, Yixuan Yuan

    Abstract: 3D reconstruction of biological tissues from a collection of endoscopic images is a key to unlock various important downstream surgical applications with 3D capabilities. Existing methods employ various advanced neural rendering techniques for photorealistic view synthesis, but they often struggle to recover accurate 3D representations when only sparse observations are available, which is usually… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Accpeted by MICCAI2024

  11. arXiv:2407.00435  [pdf, other

    cs.GR

    RTGS: Enabling Real-Time Gaussian Splatting on Mobile Devices Using Efficiency-Guided Pruning and Foveated Rendering

    Authors: Weikai Lin, Yu Feng, Yuhao Zhu

    Abstract: Point-Based Neural Rendering (PBNR), i.e., the 3D Gaussian Splatting-family algorithms, emerges as a promising class of rendering techniques, which are permeating all aspects of society, driven by a growing demand for real-time, photorealistic rendering in AR/VR and digital twins. Achieving real-time PBNR on mobile devices is challenging. This paper proposes RTGS, a PBNR system that for the firs… ▽ More

    Submitted 2 July, 2024; v1 submitted 29 June, 2024; originally announced July 2024.

    Comments: 9 pages

    MSC Class: I.3; I.2

  12. arXiv:2406.18547  [pdf

    eess.IV cs.CV

    Enhancing Medical Imaging with GANs Synthesizing Realistic Images from Limited Data

    Authors: Yinqiu Feng, Bo Zhang, Lingxi Xiao, Yutian Yang, Tana Gegen, Zexi Chen

    Abstract: In this research, we introduce an innovative method for synthesizing medical images using generative adversarial networks (GANs). Our proposed GANs method demonstrates the capability to produce realistic synthetic images even when trained on a limited quantity of real medical image data, showcasing commendable generalization prowess. To achieve this, we devised a generator and discriminator networ… ▽ More

    Submitted 22 May, 2024; originally announced June 2024.

  13. arXiv:2406.18518  [pdf, other

    cs.CL cs.AI cs.LG cs.SE

    APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets

    Authors: Zuxin Liu, Thai Hoang, Jianguo Zhang, Ming Zhu, Tian Lan, Shirley Kokane, Juntao Tan, Weiran Yao, Zhiwei Liu, Yihao Feng, Rithesh Murthy, Liangwei Yang, Silvio Savarese, Juan Carlos Niebles, Huan Wang, Shelby Heinecke, Caiming Xiong

    Abstract: The advancement of function-calling agent models requires diverse, reliable, and high-quality datasets. This paper presents APIGen, an automated data generation pipeline designed to synthesize verifiable high-quality datasets for function-calling applications. We leverage APIGen and collect 3,673 executable APIs across 21 different categories to generate diverse function-calling datasets in a scal… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  14. arXiv:2406.17233  [pdf, other

    cs.SE cs.CL

    Self-Constructed Context Decompilation with Fined-grained Alignment Enhancement

    Authors: Yunlong Feng, Yang Xu, Dechuan Teng, Honglin Mu, Xiao Xu, Libo Qin, Wanxiang Che, Qingfu Zhu

    Abstract: Decompilation transforms compiled code back into a high-level programming language for analysis when source code is unavailable. Previous work has primarily focused on enhancing decompilation performance by increasing the scale of model parameters or training data for pre-training. Based on the characteristics of the decompilation task, we propose two methods: (1) Without fine-tuning, the Self-Con… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Under Review

  15. arXiv:2406.16982  [pdf

    cs.LG cs.AI

    Research on Disease Prediction Model Construction Based on Computer AI deep Learning Technology

    Authors: Yang Lin, Muqing Li, Ziyi Zhu, Yinqiu Feng, Lingxi Xiao, Zexi Chen

    Abstract: The prediction of disease risk factors can screen vulnerable groups for effective prevention and treatment, so as to reduce their morbidity and mortality. Machine learning has a great demand for high-quality labeling information, and labeling noise in medical big data poses a great challenge to efficient disease risk warning methods. Therefore, this project intends to study the robust learning alg… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  16. arXiv:2406.16981  [pdf

    eess.IV cs.AI cs.LG eess.SP

    Research on Feature Extraction Data Processing System For MRI of Brain Diseases Based on Computer Deep Learning

    Authors: Lingxi Xiao, Jinxin Hu, Yutian Yang, Yinqiu Feng, Zichao Li, Zexi Chen

    Abstract: Most of the existing wavelet image processing techniques are carried out in the form of single-scale reconstruction and multiple iterations. However, processing high-quality fMRI data presents problems such as mixed noise and excessive computation time. This project proposes the use of matrix operations by combining mixed noise elimination methods with wavelet analysis to replace traditional itera… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  17. arXiv:2406.15778  [pdf, other

    cs.CV cs.AI

    ObjectNLQ @ Ego4D Episodic Memory Challenge 2024

    Authors: Yisen Feng, Haoyu Zhang, Yuquan Xie, Zaijing Li, Meng Liu, Liqiang Nie

    Abstract: In this report, we present our approach for the Natural Language Query track and Goal Step track of the Ego4D Episodic Memory Benchmark at CVPR 2024. Both challenges require the localization of actions within long video sequences using textual queries. To enhance localization accuracy, our method not only processes the temporal information of videos but also identifies fine-grained objects spatial… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: The solution for the Natural Language Query track and Goal Step track at CVPR EgoVis Workshop 2024

  18. arXiv:2406.15771  [pdf, other

    cs.CV cs.AI

    HCQA @ Ego4D EgoSchema Challenge 2024

    Authors: Haoyu Zhang, Yuquan Xie, Yisen Feng, Zaijing Li, Meng Liu, Liqiang Nie

    Abstract: In this report, we present our champion solution for Ego4D EgoSchema Challenge in CVPR 2024. To deeply integrate the powerful egocentric captioning model and question reasoning model, we propose a novel Hierarchical Comprehension scheme for egocentric video Question Answering, named HCQA. It consists of three stages: Fine-grained Caption Generation, Context-driven Summarization, and Inference-guid… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: The champion solution for Ego4D EgoSchema Challenge in CVPR EgoVis Workshop 2024

  19. arXiv:2406.15695  [pdf, other

    cs.CL

    SS-Bench: A Benchmark for Social Story Generation and Evaluation

    Authors: Yi Feng, Mingyang Song, Jiaqi Wang, Mao Zheng, Liping Jing, Jian Yu

    Abstract: Children with Autism Spectrum Disorder (ASD) often misunderstand social situations and struggle to participate in daily routines. Psychology experts write Social Stories under strict constraints of structural clarity, descriptive orientation, and situational safety to enhance their abilities in these regimes. However, Social Stories are costly in creation and often limited in diversity and timelin… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  20. arXiv:2406.14503  [pdf, other

    cs.CL

    Overview of the CAIL 2023 Argument Mining Track

    Authors: Jingcong Liang, Junlong Wang, Xinyu Zhai, Yungui Zhuang, Yiyang Zheng, Xin Xu, Xiandong Ran, Xiaozheng Dong, Honghui Rong, Yanlun Liu, Hao Chen, Yuhan Wei, Donghai Li, Jiajie Peng, Xuanjing Huang, Chongde Shi, Yansong Feng, Yun Song, Zhongyu Wei

    Abstract: We give a detailed overview of the CAIL 2023 Argument Mining Track, one of the Chinese AI and Law Challenge (CAIL) 2023 tracks. The main goal of the track is to identify and extract interacting argument pairs in trial dialogs. It mainly uses summarized judgment documents but can also refer to trial recordings. The track consists of two stages, and we introduce the tasks designed for each stage; we… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  21. arXiv:2406.13317  [pdf, other

    cs.CV

    M4Fog: A Global Multi-Regional, Multi-Modal, and Multi-Stage Dataset for Marine Fog Detection and Forecasting to Bridge Ocean and Atmosphere

    Authors: Mengqiu Xu, Ming Wu, Kaixin Chen, Yixiang Huang, Mingrui Xu, Yujia Yang, Yiqing Feng, Yiying Guo, Bin Huang, Dongliang Chang, Zhenwei Shi, Chuang Zhang, Zhanyu Ma, Jun Guo

    Abstract: Marine fog poses a significant hazard to global shipping, necessitating effective detection and forecasting to reduce economic losses. In recent years, several machine learning (ML) methods have demonstrated superior detection accuracy compared to traditional meteorological methods. However, most of these works are developed on proprietary datasets, and the few publicly accessible datasets are oft… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  22. arXiv:2406.12835  [pdf, other

    cs.LG cs.AI cs.IR cs.SI

    Influence Maximization via Graph Neural Bandits

    Authors: Yuting Feng, Vincent Y. F. Tan, Bogdan Cautis

    Abstract: We consider a ubiquitous scenario in the study of Influence Maximization (IM), in which there is limited knowledge about the topology of the diffusion network. We set the IM problem in a multi-round diffusion campaign, aiming to maximize the number of distinct users that are influenced. Leveraging the capability of bandit algorithms to effectively balance the objectives of exploration and exploita… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: To appear at the 2024 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD)

  23. arXiv:2406.11238  [pdf, other

    cs.CL

    What Kinds of Tokens Benefit from Distant Text? An Analysis on Long Context Language Modeling

    Authors: Yutong Hu, Quzhe Huang, Kangcheng Luo, Yansong Feng

    Abstract: As the context length that large language models can handle continues to increase, these models demonstrate an enhanced ability to utilize distant information for tasks such as language modeling. This capability contrasts with human reading and writing habits, where it is uncommon to remember and use particularly distant information, except in cases of foreshadowing. In this paper, we aim to explo… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  24. arXiv:2406.10910  [pdf, ps, other

    cs.IT eess.SP

    Fast Fractional Programming for Multi-Cell Integrated Sensing and Communications

    Authors: Yannan Chen, Yi Feng, Xiaoyang Li, Licheng Zhao, Kaiming Shen

    Abstract: This paper concerns the coordinate multi-cell beamforming design for integrated sensing and communications (ISAC). In particular, we assume that each base station (BS) has massive antennas. The optimization objective is to maximize a weighted sum of the data rates (for communications) and the Fisher information (for sensing). We first show that the conventional beamforming method for the multiple-… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  25. arXiv:2406.10605  [pdf, other

    cs.LG cs.GT

    Last-iterate Convergence Separation between Extra-gradient and Optimism in Constrained Periodic Games

    Authors: Yi Feng, Ping Li, Ioannis Panageas, Xiao Wang

    Abstract: Last-iterate behaviors of learning algorithms in repeated two-player zero-sum games have been extensively studied due to their wide applications in machine learning and related tasks. Typical algorithms that exhibit the last-iterate convergence property include optimistic and extra-gradient methods. However, most existing results establish these properties under the assumption that the game is tim… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: Accepted for UAI 2024

  26. arXiv:2406.10603  [pdf, other

    cs.GT

    Prediction Accuracy of Learning in Games : Follow-the-Regularized-Leader meets Heisenberg

    Authors: Yi Feng, Georgios Piliouras, Xiao Wang

    Abstract: We investigate the accuracy of prediction in deterministic learning dynamics of zero-sum games with random initializations, specifically focusing on observer uncertainty and its relationship to the evolution of covariances. Zero-sum games are a prominent field of interest in machine learning due to their various applications. Concurrently, the accuracy of prediction in dynamical systems from mecha… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: Accepted for ICML 2024

  27. arXiv:2406.10447  [pdf, other

    cs.CV

    The BabyView dataset: High-resolution egocentric videos of infants' and young children's everyday experiences

    Authors: Bria Long, Violet Xiang, Stefan Stojanov, Robert Z. Sparks, Zi Yin, Grace E. Keene, Alvin W. M. Tan, Steven Y. Feng, Chengxu Zhuang, Virginia A. Marchman, Daniel L. K. Yamins, Michael C. Frank

    Abstract: Human children far exceed modern machine learning algorithms in their sample efficiency, achieving high performance in key domains with much less data than current models. This ''data gap'' is a key challenge both for building intelligent artificial systems and for understanding human development. Egocentric video capturing children's experience -- their ''training data'' -- is a key ingredient fo… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 9 pages, 2 figures, 4 tables and SI. Submitted to NeurIPS Datasets and Benchmarks

  28. arXiv:2406.09834  [pdf, other

    cs.SE

    How and Why LLMs Use Deprecated APIs in Code Completion? An Empirical Study

    Authors: Chong Wang, Kaifeng Huang, Jian Zhang, Yebo Feng, Lyuye Zhang, Yang Liu, Xin Peng

    Abstract: Large language models (LLMs), pre-trained or fine-tuned on large code corpora, have shown effectiveness in generating code completions. However, in LLM-based code completion, LLMs may struggle to use correct and up-to-date Application Programming Interfaces (APIs) due to the rapid and continuous evolution of libraries. While existing studies have highlighted issues with predicting incorrect APIs,… ▽ More

    Submitted 3 July, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

  29. arXiv:2406.08897  [pdf, other

    cs.LG

    Motif-driven Subgraph Structure Learning for Graph Classification

    Authors: Zhiyao Zhou, Sheng Zhou, Bochao Mao, Jiawei Chen, Qingyun Sun, Yan Feng, Chun Chen, Can Wang

    Abstract: To mitigate the suboptimal nature of graph structure, Graph Structure Learning (GSL) has emerged as a promising approach to improve graph structure and boost performance in downstream tasks. Despite the proposal of numerous GSL methods, the progresses in this field mostly concentrated on node-level tasks, while graph-level tasks (e.g., graph classification) remain largely unexplored. Notably, appl… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 16 pages, 8 figures

  30. arXiv:2406.07913  [pdf, other

    cs.CL cs.IR

    DeTriever: Decoder-representation-based Retriever for Improving NL2SQL In-Context Learning

    Authors: Yuxi Feng, Raymond Li, Zhenan Fan, Giuseppe Carenini, Mohammadreza Pourreza, Weiwei Zhang, Yong Zhang

    Abstract: While in-context Learning (ICL) has proven to be an effective technique to improve the performance of Large Language Models (LLMs) in a variety of complex tasks, notably in translating natural language questions into Structured Query Language (NL2SQL), the question of how to select the most beneficial demonstration examples remains an open research problem. While prior works often adapted off-the-… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  31. arXiv:2406.07547  [pdf, other

    cs.CV

    Zero-shot Image Editing with Reference Imitation

    Authors: Xi Chen, Yutong Feng, Mengting Chen, Yiyang Wang, Shilong Zhang, Yu Liu, Yujun Shen, Hengshuang Zhao

    Abstract: Image editing serves as a practical yet challenging task considering the diverse demands from users, where one of the hardest parts is to precisely describe how the edited image should look like. In this work, we present a new form of editing, termed imitative editing, to help users exercise their creativity more conveniently. Concretely, to edit an image region of interest, users are free to dire… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: https://xavierchen34.github.io/MimicBrush-Page

  32. arXiv:2406.07515  [pdf, other

    cs.LG cs.AI stat.ML

    Beyond Model Collapse: Scaling Up with Synthesized Data Requires Reinforcement

    Authors: Yunzhen Feng, Elvis Dohmatob, Pu Yang, Francois Charton, Julia Kempe

    Abstract: Synthesized data from generative models is increasingly considered as an alternative to human-annotated data for fine-tuning Large Language Models. This raises concerns about model collapse: a drop in performance of models fine-tuned on generated data. Considering that it is easier for both humans and machines to tell between good and bad examples than to generate high-quality samples, we investig… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  33. arXiv:2406.07330  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    CTC-based Non-autoregressive Textless Speech-to-Speech Translation

    Authors: Qingkai Fang, Zhengrui Ma, Yan Zhou, Min Zhang, Yang Feng

    Abstract: Direct speech-to-speech translation (S2ST) has achieved impressive translation quality, but it often faces the challenge of slow decoding due to the considerable length of speech sequences. Recently, some research has turned to non-autoregressive (NAR) models to expedite decoding, yet the translation quality typically lags behind autoregressive (AR) models significantly. In this paper, we investig… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: ACL 2024 Findings

    ACM Class: I.2.7

  34. arXiv:2406.07289  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Can We Achieve High-quality Direct Speech-to-Speech Translation without Parallel Speech Data?

    Authors: Qingkai Fang, Shaolei Zhang, Zhengrui Ma, Min Zhang, Yang Feng

    Abstract: Recently proposed two-pass direct speech-to-speech translation (S2ST) models decompose the task into speech-to-text translation (S2TT) and text-to-speech (TTS) within an end-to-end model, yielding promising results. However, the training of these models still relies on parallel speech data, which is extremely challenging to collect. In contrast, S2TT and TTS have accumulated a large amount of data… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: ACL 2024 main conference. Project Page: https://ictnlp.github.io/ComSpeech-Site/

    ACM Class: I.2.7

  35. arXiv:2406.06937  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    A Non-autoregressive Generation Framework for End-to-End Simultaneous Speech-to-Any Translation

    Authors: Zhengrui Ma, Qingkai Fang, Shaolei Zhang, Shoutao Guo, Yang Feng, Min Zhang

    Abstract: Simultaneous translation models play a crucial role in facilitating communication. However, existing research primarily focuses on text-to-text or speech-to-text models, necessitating additional cascade components to achieve speech-to-speech translation. These pipeline methods suffer from error propagation and accumulate delays in each cascade component, resulting in reduced synchronization betwee… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: ACL 2024; Codes and demos are at https://github.com/ictnlp/NAST-S2x

  36. arXiv:2406.06910  [pdf, other

    cs.CL

    Agent-SiMT: Agent-assisted Simultaneous Machine Translation with Large Language Models

    Authors: Shoutao Guo, Shaolei Zhang, Zhengrui Ma, Min Zhang, Yang Feng

    Abstract: Simultaneous Machine Translation (SiMT) generates target translations while reading the source sentence. It relies on a policy to determine the optimal timing for reading sentences and generating translations. Existing SiMT methods generally adopt the traditional Transformer architecture, which concurrently determines the policy and generates translations. While they excel at determining policies,… ▽ More

    Submitted 12 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

    Comments: 18 pages, 8 figures, 7 tables. v2 of arXiv:2402.13036

  37. arXiv:2406.06852  [pdf, other

    cs.CR cs.AI cs.CL

    A Survey of Backdoor Attacks and Defenses on Large Language Models: Implications for Security Measures

    Authors: Shuai Zhao, Meihuizi Jia, Zhongliang Guo, Leilei Gan, Jie Fu, Yichao Feng, Fengjun Pan, Luu Anh Tuan

    Abstract: The large language models (LLMs), which bridge the gap between human language understanding and complex problem-solving, achieve state-of-the-art performance on several NLP tasks, particularly in few-shot and zero-shot settings. Despite the demonstrable efficacy of LMMs, due to constraints on computational resources, users have to engage with open-source language models or outsource the entire tra… ▽ More

    Submitted 13 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

  38. arXiv:2406.06808  [pdf, ps, other

    cs.DS cs.LG

    Fast White-Box Adversarial Streaming Without a Random Oracle

    Authors: Ying Feng, Aayush Jain, David P. Woodruff

    Abstract: Recently, the question of adversarially robust streaming, where the stream is allowed to depend on the randomness of the streaming algorithm, has gained a lot of attention. In this work, we consider a strong white-box adversarial model (Ajtai et al. PODS 2022), in which the adversary has access to all past random coins and the parameters used by the streaming algorithm. We focus on the sparse reco… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: ICML 2024

  39. arXiv:2406.06633  [pdf, other

    cs.LG

    PairCFR: Enhancing Model Training on Paired Counterfactually Augmented Data through Contrastive Learning

    Authors: Xiaoqi Qiu, Yongjie Wang, Xu Guo, Zhiwei Zeng, Yue Yu, Yuhong Feng, Chunyan Miao

    Abstract: Counterfactually Augmented Data (CAD) involves creating new data samples by applying minimal yet sufficient modifications to flip the label of existing data samples to other classes. Training with CAD enhances model robustness against spurious features that happen to correlate with labels by spreading the casual relationships across different classes. Yet, recent research reveals that training wit… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Accepted by ACL 2024 main conference

    MSC Class: 68T50 ACM Class: I.2; I.2.7

  40. arXiv:2406.04669  [pdf, other

    cs.CL

    DiNeR: a Large Realistic Dataset for Evaluating Compositional Generalization

    Authors: Chengang Hu, Xiao Liu, Yansong Feng

    Abstract: Most of the existing compositional generalization datasets are synthetically-generated, resulting in a lack of natural language variation. While there have been recent attempts to introduce non-synthetic datasets for compositional generalization, they suffer from either limited data scale or a lack of diversity in the forms of combinations. To better investigate compositional generalization with m… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: EMNLP 2023 long paper

  41. arXiv:2406.03878  [pdf, other

    cs.CL

    Decoder-only Streaming Transformer for Simultaneous Translation

    Authors: Shoutao Guo, Shaolei Zhang, Yang Feng

    Abstract: Simultaneous Machine Translation (SiMT) generates translation while reading source tokens, essentially producing the target prefix based on the source prefix. To achieve good performance, it leverages the relationship between source and target prefixes to exact a policy to guide the generation of translations. Although existing SiMT methods primarily focus on the Encoder-Decoder architecture, we e… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024. 14 pages, 10 Tables, 5 Figures

  42. arXiv:2406.03049  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning

    Authors: Shaolei Zhang, Qingkai Fang, Shoutao Guo, Zhengrui Ma, Min Zhang, Yang Feng

    Abstract: Simultaneous speech-to-speech translation (Simul-S2ST, a.k.a streaming speech translation) outputs target speech while receiving streaming speech inputs, which is critical for real-time communication. Beyond accomplishing translation between speech, Simul-S2ST requires a policy to control the model to generate corresponding target speech at the opportune moment within speech inputs, thereby posing… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024 main conference, Project Page: https://ictnlp.github.io/StreamSpeech-site/

  43. arXiv:2406.02056  [pdf, other

    cs.LG cs.NE

    CAP: A Context-Aware Neural Predictor for NAS

    Authors: Han Ji, Yuqi Feng, Yanan Sun

    Abstract: Neural predictors are effective in boosting the time-consuming performance evaluation stage in neural architecture search (NAS), owing to their direct estimation of unseen architectures. Despite the effectiveness, training a powerful neural predictor with fewer annotated architectures remains a huge challenge. In this paper, we propose a context-aware neural predictor (CAP) which only needs a few… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted by IJCAI24

  44. arXiv:2406.00584  [pdf, other

    cs.DB cs.AI

    A Blueprint Architecture of Compound AI Systems for Enterprise

    Authors: Eser Kandogan, Sajjadur Rahman, Nikita Bhutani, Dan Zhang, Rafael Li Chen, Kushan Mitra, Sairam Gurajada, Pouya Pezeshkpour, Hayate Iso, Yanlin Feng, Hannah Kim, Chen Shen, Jin Wang, Estevam Hruschka

    Abstract: Large Language Models (LLMs) have showcased remarkable capabilities surpassing conventional NLP challenges, creating opportunities for use in production use cases. Towards this goal, there is a notable shift to building compound AI systems, wherein LLMs are integrated into an expansive software infrastructure with many components like models, retrievers, databases and tools. In this paper, we intr… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: Compound AI Systems Workshop at the Data+AI Summit 2024

  45. CMDBench: A Benchmark for Coarse-to-fine Multimodal Data Discovery in Compound AI Systems

    Authors: Yanlin Feng, Sajjadur Rahman, Aaron Feng, Vincent Chen, Eser Kandogan

    Abstract: Compound AI systems (CASs) that employ LLMs as agents to accomplish knowledge-intensive tasks via interactions with tools and data retrievers have garnered significant interest within database and AI communities. While these systems have the potential to supplement typical analysis workflows of data analysts in enterprise data platforms, unfortunately, CASs are subject to the same data discovery c… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: Governance, Understanding and Integration of Data for Effective and Responsible AI (GUIDE-AI '24), June 14, 2024, Santiago, AA, Chile

  46. arXiv:2406.00016  [pdf

    cs.CL

    Exploration of Attention Mechanism-Enhanced Deep Learning Models in the Mining of Medical Textual Data

    Authors: Lingxi Xiao, Muqing Li, Yinqiu Feng, Meiqi Wang, Ziyi Zhu, Zexi Chen

    Abstract: The research explores the utilization of a deep learning model employing an attention mechanism in medical text mining. It targets the challenge of analyzing unstructured text information within medical data. This research seeks to enhance the model's capability to identify essential medical information by incorporating deep learning and attention mechanisms. This paper reviews the basic principle… ▽ More

    Submitted 22 May, 2024; originally announced June 2024.

    Comments: arXiv admin note: text overlap with arXiv:2405.11704 by other authors

  47. arXiv:2405.20334  [pdf, other

    cs.CV cs.GR

    VividDream: Generating 3D Scene with Ambient Dynamics

    Authors: Yao-Chih Lee, Yi-Ting Chen, Andrew Wang, Ting-Hsuan Liao, Brandon Y. Feng, Jia-Bin Huang

    Abstract: We introduce VividDream, a method for generating explorable 4D scenes with ambient dynamics from a single input image or text prompt. VividDream first expands an input image into a static 3D point cloud through iterative inpainting and geometry merging. An ensemble of animated videos is then generated using video diffusion models with quality refinement techniques and conditioned on renderings of… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Project page: https://vivid-dream-4d.github.io

  48. arXiv:2405.19290  [pdf, other

    cs.CL

    Integrating Multi-scale Contextualized Information for Byte-based Neural Machine Translation

    Authors: Langlin Huang, Yang Feng

    Abstract: Subword tokenization is a common method for vocabulary building in Neural Machine Translation (NMT) models. However, increasingly complex tasks have revealed its disadvantages. First, a vocabulary cannot be modified once it is learned, making it hard to adapt to new words. Second, in multilingual translation, the imbalance in data volumes across different languages spreads to the vocabulary, exace… ▽ More

    Submitted 7 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

    Comments: Accepted by ACL2024 Findings

  49. arXiv:2405.17660  [pdf, other

    cs.CV

    LoReTrack: Efficient and Accurate Low-Resolution Transformer Tracking

    Authors: Shaohua Dong, Yunhe Feng, Qing Yang, Yuewei Lin, Heng Fan

    Abstract: High-performance Transformer trackers have shown excellent results, yet they often bear a heavy computational load. Observing that a smaller input can immediately and conveniently reduce computations without changing the model, an easy solution is to adopt the low-resolution input for efficient Transformer tracking. Albeit faster, this hurts tracking accuracy much due to information loss in low re… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  50. arXiv:2405.16960  [pdf, other

    cs.CV cs.RO

    DCPI-Depth: Explicitly Infusing Dense Correspondence Prior to Unsupervised Monocular Depth Estimation

    Authors: Mengtan Zhang, Yi Feng, Qijun Chen, Rui Fan

    Abstract: There has been a recent surge of interest in learning to perceive depth from monocular videos in an unsupervised fashion. A key challenge in this field is achieving robust and accurate depth estimation in challenging scenarios, particularly in regions with weak textures or where dynamic objects are present. This study makes three major contributions by delving deeply into dense correspondence prio… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: 13 pages, 7 figures