Skip to main content

Showing 1–50 of 458 results for author: Ren, X

  1. arXiv:2407.12294  [pdf, other

    cs.CV

    VEON: Vocabulary-Enhanced Occupancy Prediction

    Authors: Jilai Zheng, Pin Tang, Zhongdao Wang, Guoqing Wang, Xiangxuan Ren, Bailan Feng, Chao Ma

    Abstract: Perceiving the world as 3D occupancy supports embodied agents to avoid collision with any types of obstacle. While open-vocabulary image understanding has prospered recently, how to bind the predicted 3D occupancy grids with open-world semantics still remains under-explored due to limited open-world annotations. Hence, instead of building our model from scratch, we try to blend 2D foundation model… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV2024

  2. arXiv:2407.10671  [pdf, other

    cs.CL cs.AI

    Qwen2 Technical Report

    Authors: An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Zhou, Chengpeng Li, Chengyuan Li, Dayiheng Liu, Fei Huang, Guanting Dong, Haoran Wei, Huan Lin, Jialong Tang, Jialin Wang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Ma, Jianxin Yang, Jin Xu, Jingren Zhou, Jinze Bai, Jinzheng He, Junyang Lin , et al. (37 additional authors not shown)

    Abstract: This report introduces the Qwen2 series, the latest addition to our large language models and large multimodal models. We release a comprehensive suite of foundational and instruction-tuned language models, encompassing a parameter range from 0.5 to 72 billion, featuring dense models and a Mixture-of-Experts model. Qwen2 surpasses most prior open-weight models, including its predecessor Qwen1.5, a… ▽ More

    Submitted 17 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: 25 pages, 1 figure

  3. arXiv:2407.07950  [pdf, other

    cs.CL cs.AI cs.HC

    Rel-A.I.: An Interaction-Centered Approach To Measuring Human-LM Reliance

    Authors: Kaitlyn Zhou, Jena D. Hwang, Xiang Ren, Nouha Dziri, Dan Jurafsky, Maarten Sap

    Abstract: The reconfiguration of human-LM interactions from simple sentence completions to complex, multi-domain, humanlike engagements necessitates new methodologies to understand how humans choose to rely on LMs. In our work, we contend that reliance is influenced by numerous factors within the interactional context of a generation, a departure from prior work that used verbalized confidence (e.g., "I'm c… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: Preprint

  4. arXiv:2407.07672  [pdf, other

    cs.HC

    StoryDiffusion: How to Support UX Storyboarding With Generative-AI

    Authors: Zhaohui Liang, Xiaoyu Zhang, Kevin Ma, Zhao Liu, Xipei Ren, Kosa Goucher-Lambert, Can Liu

    Abstract: Storyboarding is an established method for designing user experiences. Generative AI can support this process by helping designers quickly create visual narratives. However, existing tools only focus on accurate text-to-image generation. Currently, it is not clear how to effectively support the entire creative process of storyboarding and how to develop AI-powered tools to support designers' indiv… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  5. arXiv:2407.05232  [pdf, other

    cs.LG

    PAPM: A Physics-aware Proxy Model for Process Systems

    Authors: Pengwei Liu, Zhongkai Hao, Xingyu Ren, Hangjie Yuan, Jiayang Ren, Dong Ni

    Abstract: In the context of proxy modeling for process systems, traditional data-driven deep learning approaches frequently encounter significant challenges, such as substantial training costs induced by large amounts of data, and limited generalization capabilities. As a promising alternative, physics-aware models incorporate partial physics knowledge to ameliorate these challenges. Although demonstrating… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

    Comments: ICML 2024

  6. arXiv:2407.01781  [pdf, other

    cs.CV cs.GR cs.LG

    fVDB: A Deep-Learning Framework for Sparse, Large-Scale, and High-Performance Spatial Intelligence

    Authors: Francis Williams, Jiahui Huang, Jonathan Swartz, Gergely Klár, Vijay Thakkar, Matthew Cong, Xuanchi Ren, Ruilong Li, Clement Fuji-Tsang, Sanja Fidler, Eftychios Sifakis, Ken Museth

    Abstract: We present fVDB, a novel GPU-optimized framework for deep learning on large-scale 3D data. fVDB provides a complete set of differentiable primitives to build deep learning architectures for common tasks in 3D learning such as convolution, pooling, attention, ray-tracing, meshing, etc. fVDB simultaneously provides a much larger feature set (primitives and operators) than established frameworks wi… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  7. arXiv:2406.19008  [pdf, other

    cs.DS

    VertiMRF: Differentially Private Vertical Federated Data Synthesis

    Authors: Fangyuan Zhao, Zitao Li, Xuebin Ren, Bolin Ding, Shusen Yang, Yaliang Li

    Abstract: Data synthesis is a promising solution to share data for various downstream analytic tasks without exposing raw data. However, without a theoretical privacy guarantee, a synthetic dataset would still leak some sensitive information. Differential privacy is thus widely adopted to safeguard data synthesis by strictly limiting the released information. This technique is advantageous yet presents sign… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  8. arXiv:2406.16672  [pdf, other

    cs.CL cs.AI

    CAVE: Controllable Authorship Verification Explanations

    Authors: Sahana Ramnath, Kartik Pandey, Elizabeth Boschee, Xiang Ren

    Abstract: Authorship Verification (AV) (do two documents have the same author?) is essential for many sensitive real-life applications. AV is often used in proprietary domains that require a private, offline model, making SOTA online models like ChatGPT undesirable. Other SOTA systems use methods, e.g. Siamese Networks, that are uninterpretable, and hence cannot be trusted in high-stakes applications. In th… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  9. arXiv:2406.14422  [pdf, other

    cs.CV cs.AI

    FutureNet-LOF: Joint Trajectory Prediction and Lane Occupancy Field Prediction with Future Context Encoding

    Authors: Mingkun Wang, Xiaoguang Ren, Ruochun Jin, Minglong Li, Xiaochuan Zhang, Changqian Yu, Mingxu Wang, Wenjing Yang

    Abstract: Most prior motion prediction endeavors in autonomous driving have inadequately encoded future scenarios, leading to predictions that may fail to accurately capture the diverse movements of agents (e.g., vehicles or pedestrians). To address this, we propose FutureNet, which explicitly integrates initially predicted trajectories into the future scenario and further encodes these future contexts to e… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 10 pages

  10. arXiv:2406.14026  [pdf, other

    cs.LG cs.CL stat.ML

    Demystifying Forgetting in Language Model Fine-Tuning with Statistical Analysis of Example Associations

    Authors: Xisen Jin, Xiang Ren

    Abstract: Language models (LMs) are known to suffer from forgetting of previously learned examples when fine-tuned, breaking stability of deployed LM systems. Despite efforts on mitigating forgetting, few have investigated whether, and how forgotten upstream examples are associated with newly learned tasks. Insights on such associations enable efficient and targeted mitigation of forgetting. In this paper,… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 5 pages

  11. arXiv:2406.13149  [pdf, other

    cs.CV

    High-Fidelity Facial Albedo Estimation via Texture Quantization

    Authors: Zimin Ran, Xingyu Ren, Xiang An, Kaicheng Yang, Xiangzi Dai, Ziyong Feng, Jia Guo, Linchao Zhu, Jiankang Deng

    Abstract: Recent 3D face reconstruction methods have made significant progress in shape estimation, but high-fidelity facial albedo reconstruction remains challenging. Existing methods depend on expensive light-stage captured data to learn facial albedo maps. However, a lack of diversity in subjects limits their ability to recover high-fidelity results. In this paper, we present a novel facial albedo recons… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  12. arXiv:2406.11285  [pdf, other

    cs.CR cs.CL

    Self and Cross-Model Distillation for LLMs: Effective Methods for Refusal Pattern Alignment

    Authors: Jie Li, Yi Liu, Chongyang Liu, Xiaoning Ren, Ling Shi, Weisong Sun, Yinxing Xue

    Abstract: Large Language Models (LLMs) like OpenAI's GPT series, Anthropic's Claude, and Meta's LLaMa have shown remarkable capabilities in text generation. However, their susceptibility to toxic prompts presents significant security challenges. This paper investigates alignment techniques, including Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF), to mitigate these risks.… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  13. arXiv:2406.07342  [pdf, other

    cs.NI cs.DC

    EdgeTimer: Adaptive Multi-Timescale Scheduling in Mobile Edge Computing with Deep Reinforcement Learning

    Authors: Yijun Hao, Shusen Yang, Fang Li, Yifan Zhang, Shibo Wang, Xuebin Ren

    Abstract: In mobile edge computing (MEC), resource scheduling is crucial to task requests' performance and service providers' cost, involving multi-layer heterogeneous scheduling decisions. Existing schedulers typically adopt static timescales to regularly update scheduling decisions of each layer, without adaptive adjustment of timescales for different layers, resulting in potentially poor performance in p… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  14. arXiv:2406.04137  [pdf, other

    cs.LG math.ST stat.ML

    Optimal Batched Linear Bandits

    Authors: Xuanfei Ren, Tianyuan Jin, Pan Xu

    Abstract: We introduce the E$^4$ algorithm for the batched linear bandit problem, incorporating an Explore-Estimate-Eliminate-Exploit framework. With a proper choice of exploration rate, we prove E$^4$ achieves the finite-time minimax optimal regret with only $O(\log\log T)$ batches, and the asymptotically optimal regret with only $3$ batches as $T\rightarrow\infty$, where $T$ is the time horizon. We furthe… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 26 pages, 6 figures, 4 tables. To appear in the proceedings of the 41st International Conference on Machine Learning (ICML 2024)

  15. arXiv:2406.04094  [pdf, other

    cs.RO

    Data-driven Explainable Controller for Soft Robots based on Recurrent Neural Networks

    Authors: Zixi Chen, Xuyang Ren, Gastone Ciuti, Cesare Stefanini

    Abstract: The nonlinearity and hysteresis of soft robot motions have posed challenges in accurate soft robot control. Neural networks, especially recurrent neural networks (RNNs), have been widely leveraged for this issue due to their nonlinear activation functions and recurrent structures. Although they have shown satisfying accuracy in most tasks, these black-box approaches are not explainable, and hence,… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 10 pages, 8 figures, 5 tables

  16. Refactoring to Pythonic Idioms: A Hybrid Knowledge-Driven Approach Leveraging Large Language Models

    Authors: Zejun Zhang, Zhenchang Xing, Xiaoxue Ren, Qinghua Lu, Xiwei Xu

    Abstract: Pythonic idioms are highly valued and widely used in the Python programming community. However, many Python users find it challenging to use Pythonic idioms. Adopting a rule-based approach or LLM-only approach is not sufficient to overcome three persistent challenges of code idiomatization including code miss, wrong detection and wrong refactoring. Motivated by the determinism of rules and adaptab… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted by FSE 2024,22 pages

  17. arXiv:2406.02377  [pdf, other

    cs.IR cs.AI cs.CL

    XRec: Large Language Models for Explainable Recommendation

    Authors: Qiyao Ma, Xubin Ren, Chao Huang

    Abstract: Recommender systems help users navigate information overload by providing personalized recommendations aligned with their preferences. Collaborative Filtering (CF) is a widely adopted approach, but while advanced techniques like graph neural networks (GNNs) and self-supervised learning (SSL) have enhanced CF models for better user representations, they often lack the ability to provide explanation… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  18. arXiv:2406.01355  [pdf, other

    cs.CV cs.AI cs.CR

    Differentially Private Fine-Tuning of Diffusion Models

    Authors: Yu-Lin Tsai, Yizhe Li, Zekai Chen, Po-Yu Chen, Chia-Mu Yu, Xuebin Ren, Francois Buet-Golfouse

    Abstract: The integration of Differential Privacy (DP) with diffusion models (DMs) presents a promising yet challenging frontier, particularly due to the substantial memorization capabilities of DMs that pose significant privacy risks. Differential privacy offers a rigorous framework for safeguarding individual data points during model training, with Differential Privacy Stochastic Gradient Descent (DP-SGD)… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 16 pages, 5 figures, 11 tables

  19. arXiv:2406.00440  [pdf, other

    cs.CV

    Topo4D: Topology-Preserving Gaussian Splatting for High-Fidelity 4D Head Capture

    Authors: Xuanchen Li, Yuhao Cheng, Xingyu Ren, Haozhe Jia, Di Xu, Wenhan Zhu, Yichao Yan

    Abstract: 4D head capture aims to generate dynamic topological meshes and corresponding texture maps from videos, which is widely utilized in movies and games for its ability to simulate facial muscle movements and recover dynamic textures in pore-squeezing. The industry often adopts the method involving multi-view stereo and non-rigid alignment. However, this approach is prone to errors and heavily reliant… ▽ More

    Submitted 15 July, 2024; v1 submitted 1 June, 2024; originally announced June 2024.

  20. arXiv:2405.14722  [pdf, other

    cs.CL

    CAPE: Context-Adaptive Positional Encoding for Length Extrapolation

    Authors: Chuanyang Zheng, Yihang Gao, Han Shi, Minbin Huang, Jingyao Li, Jing Xiong, Xiaozhe Ren, Michael Ng, Xin Jiang, Zhenguo Li, Yu Li

    Abstract: Positional encoding plays a crucial role in transformers, significantly impacting model performance and length generalization. Prior research has introduced absolute positional encoding (APE) and relative positional encoding (RPE) to distinguish token positions in given sequences. However, both APE and RPE remain fixed after model training regardless of input data, limiting their adaptability and… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: Technical Report

  21. arXiv:2405.12577  [pdf, other

    cs.RO math.OC

    Fast Estimation of Relative Transformation Based on Fusion of Odometry and UWB Ranging Data

    Authors: Yuan Fu, Zheng Zhang, Guangyang Zeng, Chun Liu, Junfeng Wu, Xiaoqiang Ren

    Abstract: In this paper, we investigate the problem of estimating the 4-DOF (three-dimensional position and orientation) robot-robot relative frame transformation using odometers and distance measurements between robots. Firstly, we apply a two-step estimation method based on maximum likelihood estimation. Specifically, a good initial value is obtained through unconstrained least squares and projection, fol… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: 15 pages, 4 figures

    MSC Class: 93J08 ACM Class: G.m

  22. A Survey of Large Language Models for Graphs

    Authors: Xubin Ren, Jiabin Tang, Dawei Yin, Nitesh Chawla, Chao Huang

    Abstract: Graphs are an essential data structure utilized to represent relationships in real-world scenarios. Prior research has established that Graph Neural Networks (GNNs) deliver impressive outcomes in graph-centric tasks, such as link prediction and node classification. Despite these advancements, challenges like data sparsity and limited generalization capabilities continue to persist. Recently, Large… ▽ More

    Submitted 24 June, 2024; v1 submitted 10 May, 2024; originally announced May 2024.

    Comments: Published as a KDD'24 survey paper

  23. arXiv:2405.01470  [pdf, other

    cs.CL

    WildChat: 1M ChatGPT Interaction Logs in the Wild

    Authors: Wenting Zhao, Xiang Ren, Jack Hessel, Claire Cardie, Yejin Choi, Yuntian Deng

    Abstract: Chatbots such as GPT-4 and ChatGPT are now serving millions of users. Despite their widespread use, there remains a lack of public datasets showcasing how these tools are used by a population of users in practice. To bridge this gap, we offered free access to ChatGPT for online users in exchange for their affirmative, consensual opt-in to anonymously collect their chat transcripts and request head… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: accepted by ICLR 2024

  24. arXiv:2404.18814  [pdf, ps, other

    cs.CR

    Belt and Brace: When Federated Learning Meets Differential Privacy

    Authors: Xuebin Ren, Shusen Yang, Cong Zhao, Julie McCann, Zongben Xu

    Abstract: Federated learning (FL) has great potential for large-scale machine learning (ML) without exposing raw data.Differential privacy (DP) is the de facto standard of privacy protection with provable guarantees.Advances in ML suggest that DP would be a perfect fit for FL with comprehensive privacy preservation. Hence, extensive efforts have been devoted to achieving practically usable FL with DP, which… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: 10 pages, 4 figures, accepted by and to appear in Communications of the ACM (CACM)

  25. arXiv:2404.16841  [pdf, other

    cs.CR

    Machine Unlearning in Large Language Models

    Authors: Kongyang Chen, Zixin Wang, Bing Mi, Waixi Liu, Shaowei Wang, Xiaojun Ren, Jiaxing Shen

    Abstract: Recently, large language models (LLMs) have emerged as a notable field, attracting significant attention for its ability to automatically generate intelligent contents for various application domains. However, LLMs still suffer from significant security and privacy issues. For example, LLMs might expose user privacy from hacking attacks or targeted prompts. To address this problem, this paper intr… ▽ More

    Submitted 3 February, 2024; originally announced April 2024.

  26. arXiv:2404.15014  [pdf, other

    cs.CV

    OccGen: Generative Multi-modal 3D Occupancy Prediction for Autonomous Driving

    Authors: Guoqing Wang, Zhongdao Wang, Pin Tang, Jilai Zheng, Xiangxuan Ren, Bailan Feng, Chao Ma

    Abstract: Existing solutions for 3D semantic occupancy prediction typically treat the task as a one-shot 3D voxel-wise segmentation perception problem. These discriminative methods focus on learning the mapping between the inputs and occupancy map in a single step, lacking the ability to gradually refine the occupancy map and the reasonable scene imaginative capacity to complete the local regions somewhere.… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  27. arXiv:2404.10199  [pdf, other

    cs.CL cs.AI

    CULTURE-GEN: Revealing Global Cultural Perception in Language Models through Natural Language Prompting

    Authors: Huihan Li, Liwei Jiang, Jena D. Huang, Hyunwoo Kim, Sebastin Santy, Taylor Sorensen, Bill Yuchen Lin, Nouha Dziri, Xiang Ren, Yejin Choi

    Abstract: As the utilization of large language models (LLMs) has proliferated worldwide, it is crucial for them to have adequate knowledge and fair representation for diverse global cultures. In this work, we uncover culture perceptions of three SOTA models on 110 countries and regions on 8 culture-related topics through culture-conditioned generations, and extract symbols from these generations that are as… ▽ More

    Submitted 26 April, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

  28. arXiv:2404.09502  [pdf, other

    cs.CV

    SparseOcc: Rethinking Sparse Latent Representation for Vision-Based Semantic Occupancy Prediction

    Authors: Pin Tang, Zhongdao Wang, Guoqing Wang, Jilai Zheng, Xiangxuan Ren, Bailan Feng, Chao Ma

    Abstract: Vision-based perception for autonomous driving requires an explicit modeling of a 3D space, where 2D latent representations are mapped and subsequent 3D operators are applied. However, operating on dense latent spaces introduces a cubic time and space complexity, which limits scalability in terms of perception range or spatial resolution. Existing approaches compress the dense representation using… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: 10 pages, 4 figures, accepted by CVPR 2024

    Journal ref: IEEE Conference on Computer Vision and Pattern Recognition 2024 (CVPR 2024)

  29. arXiv:2404.09172  [pdf, other

    cs.CV cs.AI

    LoopAnimate: Loopable Salient Object Animation

    Authors: Fanyi Wang, Peng Liu, Haotian Hu, Dan Meng, Jingwen Su, Jinjin Xu, Yanhao Zhang, Xiaoming Ren, Zhiwang Zhang

    Abstract: Research on diffusion model-based video generation has advanced rapidly. However, limitations in object fidelity and generation length hinder its practical applications. Additionally, specific domains like animated wallpapers require seamless looping, where the first and last frames of the video match seamlessly. To address these challenges, this paper proposes LoopAnimate, a novel method for gene… ▽ More

    Submitted 16 April, 2024; v1 submitted 14 April, 2024; originally announced April 2024.

  30. arXiv:2404.08870  [pdf, other

    cs.CC

    Almost Optimal Time Lower Bound for Approximating Parameterized Clique, CSP, and More, under ETH

    Authors: Venkatesan Guruswami, Bingkai Lin, Xuandi Ren, Yican Sun, Kewen Wu

    Abstract: The Parameterized Inapproximability Hypothesis (PIH), which is an analog of the PCP theorem in parameterized complexity, asserts that, there is a constant $\varepsilon> 0$ such that for any computable function $f:\mathbb{N}\to\mathbb{N}$, no $f(k)\cdot n^{O(1)}$-time algorithm can, on input a $k$-variable CSP instance with domain size $n$, find an assignment satisfying $1-\varepsilon$ fraction of… ▽ More

    Submitted 11 June, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

  31. arXiv:2404.08566  [pdf, other

    eess.SP cs.LG

    Mitigating Receiver Impact on Radio Frequency Fingerprint Identification via Domain Adaptation

    Authors: Liu Yang, Qiang Li, Xiaoyang Ren, Yi Fang, Shafei Wang

    Abstract: Radio Frequency Fingerprint Identification (RFFI), which exploits non-ideal hardware-induced unique distortion resident in the transmit signals to identify an emitter, is emerging as a means to enhance the security of communication systems. Recently, machine learning has achieved great success in developing state-of-the-art RFFI models. However, few works consider cross-receiver RFFI problems, whe… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: Accepted by IEEE Internet of Things Journal

  32. arXiv:2404.06247  [pdf, other

    cs.CV

    LRR: Language-Driven Resamplable Continuous Representation against Adversarial Tracking Attacks

    Authors: Jianlang Chen, Xuhong Ren, Qing Guo, Felix Juefei-Xu, Di Lin, Wei Feng, Lei Ma, Jianjun Zhao

    Abstract: Visual object tracking plays a critical role in visual-based autonomous systems, as it aims to estimate the position and size of the object of interest within a live video. Despite significant progress made in this field, state-of-the-art (SOTA) trackers often fail when faced with adversarial perturbations in the incoming frames. This can lead to significant robustness and security issues when the… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

  33. arXiv:2404.03354  [pdf, other

    cs.IR cs.AI

    A Comprehensive Survey on Self-Supervised Learning for Recommendation

    Authors: Xubin Ren, Wei Wei, Lianghao Xia, Chao Huang

    Abstract: Recommender systems play a crucial role in tackling the challenge of information overload by delivering personalized recommendations based on individual user preferences. Deep learning techniques, such as RNNs, GNNs, and Transformer architectures, have significantly propelled the advancement of recommender systems by enhancing their comprehension of user behaviors and preferences. However, supervi… ▽ More

    Submitted 7 April, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

  34. arXiv:2404.02425  [pdf, other

    cs.CR

    Novel_Authentication_Protocols_Tailored_for_Ambient_IoT_Devices_in_3GPP_5G_Networks

    Authors: Xiongpeng Ren, Jin Cao, Hui Li, Yinghui Zhang

    Abstract: AIoT devices have attracted significant attention within the 3GPP organization. These devices, distinguished from conventional IoT devices, do not rely on additional batteries or have extremely small battery capacities, offering features such as low cost, easy deployment, and maintenance-free operation. Authentication and secure transmission are fundamental security requirements for AIoT devices.… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

  35. arXiv:2404.00488  [pdf

    cs.CL cs.AI cs.LG

    Noise-Aware Training of Layout-Aware Language Models

    Authors: Ritesh Sarkhel, Xiaoqi Ren, Lauro Beltrao Costa, Guolong Su, Vincent Perot, Yanan Xie, Emmanouil Koukoumidis, Arnab Nandi

    Abstract: A visually rich document (VRD) utilizes visual features along with linguistic cues to disseminate information. Training a custom extractor that identifies named entities from a document requires a large number of instances of the target document type annotated at textual and visual modalities. This is an expensive bottleneck in enterprise scenarios, where we want to train custom extractors for tho… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  36. arXiv:2404.00301  [pdf, other

    cs.CV

    Monocular Identity-Conditioned Facial Reflectance Reconstruction

    Authors: Xingyu Ren, Jiankang Deng, Yuhao Cheng, Jia Guo, Chao Ma, Yichao Yan, Wenhan Zhu, Xiaokang Yang

    Abstract: Recent 3D face reconstruction methods have made remarkable advancements, yet there remain huge challenges in monocular high-quality facial reflectance reconstruction. Existing methods rely on a large amount of light-stage captured data to learn facial reflectance models. However, the lack of subject diversity poses challenges in achieving good generalization and widespread applicability. In this p… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024

  37. arXiv:2403.20327  [pdf, other

    cs.CL cs.AI

    Gecko: Versatile Text Embeddings Distilled from Large Language Models

    Authors: Jinhyuk Lee, Zhuyun Dai, Xiaoqi Ren, Blair Chen, Daniel Cer, Jeremy R. Cole, Kai Hui, Michael Boratko, Rajvi Kapadia, Wen Ding, Yi Luan, Sai Meher Karthik Duddu, Gustavo Hernandez Abrego, Weiqiang Shi, Nithi Gupta, Aditya Kusupati, Prateek Jain, Siddhartha Reddy Jonnalagadda, Ming-Wei Chang, Iftekhar Naim

    Abstract: We present Gecko, a compact and versatile text embedding model. Gecko achieves strong retrieval performance by leveraging a key idea: distilling knowledge from large language models (LLMs) into a retriever. Our two-step distillation process begins with generating diverse, synthetic paired data using an LLM. Next, we further refine the data quality by retrieving a set of candidate passages for each… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: 18 pages

  38. arXiv:2403.17822  [pdf, other

    cs.CV

    DN-Splatter: Depth and Normal Priors for Gaussian Splatting and Meshing

    Authors: Matias Turkulainen, Xuqian Ren, Iaroslav Melekhov, Otto Seiskari, Esa Rahtu, Juho Kannala

    Abstract: High-fidelity 3D reconstruction of common indoor scenes is crucial for VR and AR applications. 3D Gaussian splatting, a novel differentiable rendering technique, has achieved state-of-the-art novel view synthesis results with high rendering speeds and relatively low training times. However, its performance on scenes commonly seen in indoor datasets is poor due to the lack of geometric constraints… ▽ More

    Submitted 17 July, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

    Comments: 8 pages, updated figures, updated ablations, updated supplementary material

  39. arXiv:2403.12445  [pdf, other

    cs.CV

    Boosting Transferability in Vision-Language Attacks via Diversification along the Intersection Region of Adversarial Trajectory

    Authors: Sensen Gao, Xiaojun Jia, Xuhong Ren, Ivor Tsang, Qing Guo

    Abstract: Vision-language pre-training (VLP) models exhibit remarkable capabilities in comprehending both images and text, yet they remain susceptible to multimodal adversarial examples (AEs). Strengthening attacks and uncovering vulnerabilities, especially common issues in VLP models (e.g., high transferable AEs), can advance reliable and practical VLP models. A recent work (i.e., Set-level guidance attack… ▽ More

    Submitted 14 July, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

    Comments: ECCV2024. Code is available at https://github.com/SensenGao/VLPTransferAttack

  40. arXiv:2403.12384  [pdf, other

    cs.IR cs.LG

    An Aligning and Training Framework for Multimodal Recommendations

    Authors: Yifan Liu, Kangning Zhang, Xiangyuan Ren, Yanhua Huang, Jiarui Jin, Yingjie Qin, Ruilong Su, Ruiwen Xu, Weinan Zhang

    Abstract: With the development of multimedia applications, multimodal recommendations play an essential role, as they can leverage rich contexts beyond user and item interactions. Existing methods mainly use them to help learn ID features; however, there exist semantic gaps among multimodal content features and ID features. Directly using multimodal information as an auxiliary would lead to misalignment in… ▽ More

    Submitted 21 May, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: 11 pages, revise some typos, correct some explanations

  41. arXiv:2403.09539  [pdf, other

    cs.CL cs.AI cs.CR cs.LG

    Logits of API-Protected LLMs Leak Proprietary Information

    Authors: Matthew Finlayson, Xiang Ren, Swabha Swayamdipta

    Abstract: The commercialization of large language models (LLMs) has led to the common practice of high-level API-only access to proprietary models. In this work, we show that even with a conservative assumption about the model architecture, it is possible to learn a surprisingly large amount of non-public information about an API-protected LLM from a relatively small number of API queries (e.g., costing und… ▽ More

    Submitted 14 March, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

    MSC Class: 68T50 ACM Class: I.2.7

  42. arXiv:2403.08433  [pdf, other

    cs.CV

    An Empirical Study of Parameter Efficient Fine-tuning on Vision-Language Pre-train Model

    Authors: Yuxin Tian, Mouxing Yang, Yunfan Li, Dayiheng Liu, Xingzhang Ren, Xi Peng, Jiancheng Lv

    Abstract: Recent studies applied Parameter Efficient Fine-Tuning techniques (PEFTs) to efficiently narrow the performance gap between pre-training and downstream. There are two important factors for various PEFTs, namely, the accessible data size and fine-tunable parameter size. A natural expectation for PEFTs is that the performance of various PEFTs is positively related to the data size and fine-tunable p… ▽ More

    Submitted 18 May, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

    Comments: Accepted by ICME2024

  43. arXiv:2403.07518  [pdf, other

    cs.CV

    Open-Vocabulary Scene Text Recognition via Pseudo-Image Labeling and Margin Loss

    Authors: Xuhua Ren, Hengcan Shi, Jin Li

    Abstract: Scene text recognition is an important and challenging task in computer vision. However, most prior works focus on recognizing pre-defined words, while there are various out-of-vocabulary (OOV) words in real-world applications. In this paper, we propose a novel open-vocabulary text recognition framework, Pseudo-OCR, to recognize OOV words. The key challenge in this task is the lack of OOV traini… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  44. arXiv:2403.07300  [pdf, other

    cs.LG cs.CL

    CALF: Aligning LLMs for Time Series Forecasting via Cross-modal Fine-Tuning

    Authors: Peiyuan Liu, Hang Guo, Tao Dai, Naiqi Li, Jigang Bao, Xudong Ren, Yong Jiang, Shu-Tao Xia

    Abstract: Deep learning (e.g., Transformer) has been widely and successfully used in multivariate time series forecasting (MTSF). Unlike existing methods that focus on training models from a single modal of time series input, large language models (LLMs) based MTSF methods with cross-modal text and time series input have recently shown great superiority, especially with limited temporal data. However, curre… ▽ More

    Submitted 23 May, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

  45. arXiv:2403.04692  [pdf, other

    cs.CV

    PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

    Authors: Junsong Chen, Chongjian Ge, Enze Xie, Yue Wu, Lewei Yao, Xiaozhe Ren, Zhongdao Wang, Ping Luo, Huchuan Lu, Zhenguo Li

    Abstract: In this paper, we introduce PixArt-Σ, a Diffusion Transformer model~(DiT) capable of directly generating images at 4K resolution. PixArt-Σrepresents a significant advancement over its predecessor, PixArt-α, offering images of markedly higher fidelity and improved alignment with text prompts. A key feature of PixArt-Σis its training efficiency. Leveraging the foundational pre-training of PixArt-α,… ▽ More

    Submitted 17 March, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

    Comments: Project Page: https://pixart-alpha.github.io/PixArt-sigma-project/

  46. arXiv:2403.04652  [pdf, other

    cs.CL cs.AI

    Yi: Open Foundation Models by 01.AI

    Authors: 01. AI, :, Alex Young, Bei Chen, Chao Li, Chengen Huang, Ge Zhang, Guanwei Zhang, Heng Li, Jiangcheng Zhu, Jianqun Chen, Jing Chang, Kaidong Yu, Peng Liu, Qiang Liu, Shawn Yue, Senbin Yang, Shiming Yang, Tao Yu, Wen Xie, Wenhao Huang, Xiaohui Hu, Xiaoyi Ren, Xinyao Niu, Pengcheng Nie , et al. (7 additional authors not shown)

    Abstract: We introduce the Yi model family, a series of language and multimodal models that demonstrate strong multi-dimensional capabilities. The Yi model family is based on 6B and 34B pretrained language models, then we extend them to chat models, 200K long context models, depth-upscaled models, and vision-language models. Our base models achieve strong performance on a wide range of benchmarks like MMLU,… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

  47. arXiv:2403.00381  [pdf, other

    cs.RO cs.LG eess.SY

    Structured Deep Neural Networks-Based Backstepping Trajectory Tracking Control for Lagrangian Systems

    Authors: Jiajun Qian, Liang Xu, Xiaoqiang Ren, Xiaofan Wang

    Abstract: Deep neural networks (DNN) are increasingly being used to learn controllers due to their excellent approximation capabilities. However, their black-box nature poses significant challenges to closed-loop stability guarantees and performance analysis. In this paper, we introduce a structured DNN-based controller for the trajectory tracking control of Lagrangian systems using backing techniques. By p… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

  48. arXiv:2402.15648  [pdf, other

    cs.CV

    MambaIR: A Simple Baseline for Image Restoration with State-Space Model

    Authors: Hang Guo, Jinmin Li, Tao Dai, Zhihao Ouyang, Xudong Ren, Shu-Tao Xia

    Abstract: Recent years have seen significant advancements in image restoration, largely attributed to the development of modern deep neural networks, such as CNNs and Transformers. However, existing restoration backbones often face the dilemma between global receptive fields and efficient computation, hindering their application in practice. Recently, the Selective Structured State Space Model, especially t… ▽ More

    Submitted 25 March, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

    Comments: Technical Report

  49. arXiv:2402.13584  [pdf, other

    cs.CL

    WinoViz: Probing Visual Properties of Objects Under Different States

    Authors: Woojeong Jin, Tejas Srinivasan, Jesse Thomason, Xiang Ren

    Abstract: Humans perceive and comprehend different visual properties of an object based on specific contexts. For instance, we know that a banana turns brown ``when it becomes rotten,'' whereas it appears green ``when it is unripe.'' Previous studies on probing visual commonsense knowledge have primarily focused on examining language models' understanding of typical properties (e.g., colors and shapes) of o… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

    Comments: Preprint

  50. arXiv:2402.11442  [pdf, other

    cs.CL

    Can LLMs Reason with Rules? Logic Scaffolding for Stress-Testing and Improving LLMs

    Authors: Siyuan Wang, Zhongyu Wei, Yejin Choi, Xiang Ren

    Abstract: Large language models (LLMs) have achieved impressive human-like performance across various reasoning tasks. However, their mastery of underlying inferential rules still falls short of human capabilities. To investigate this, we propose a logic scaffolding inferential rule generation framework, to construct an inferential rule base, ULogic, comprising both primitive and compositional rules across… ▽ More

    Submitted 20 June, 2024; v1 submitted 17 February, 2024; originally announced February 2024.

    Comments: Accepted as a long paper to ACL 2024 Main