Skip to main content

Showing 1–50 of 60 results for author: Geng, Z

  1. arXiv:2407.15026  [pdf, other

    cs.AR cs.AI

    Benchmarking End-To-End Performance of AI-Based Chip Placement Algorithms

    Authors: Zhihai Wang, Zijie Geng, Zhaojie Tu, Jie Wang, Yuxi Qian, Zhexuan Xu, Ziyan Liu, Siyuan Xu, Zhentao Tang, Shixiong Kai, Mingxuan Yuan, Jianye Hao, Bin Li, Yongdong Zhang, Feng Wu

    Abstract: The increasing complexity of modern very-large-scale integration (VLSI) design highlights the significance of Electronic Design Automation (EDA) technologies. Chip placement is a critical step in the EDA workflow, which positions chip modules on the canvas with the goal of optimizing performance, power, and area (PPA) metrics of final chip designs. Recent advances have demonstrated the great poten… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: A comprehensive benchmark for AI-based chip placement algorithms using end-to-end performance metrics

  2. arXiv:2407.07933  [pdf, other

    stat.ME cs.LG stat.ML

    Identification and Estimation of the Bi-Directional MR with Some Invalid Instruments

    Authors: Feng Xie, Zhen Yao, Lin Xie, Yan Zeng, Zhi Geng

    Abstract: We consider the challenging problem of estimating causal effects from purely observational data in the bi-directional Mendelian randomization (MR), where some invalid instruments, as well as unmeasured confounding, usually exist. To address this problem, most existing methods attempt to find proper valid instrumental variables (IVs) for the target causal effect by expert knowledge or by assuming t… ▽ More

    Submitted 12 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

    Comments: 27 pages, 6 tables, 7 figures

  3. arXiv:2406.14548  [pdf, other

    cs.LG cs.CV

    Consistency Models Made Easy

    Authors: Zhengyang Geng, Ashwini Pokle, William Luo, Justin Lin, J. Zico Kolter

    Abstract: Consistency models (CMs) are an emerging class of generative models that offer faster sampling than traditional diffusion models. CMs enforce that all points along a sampling trajectory are mapped to the same initial point. But this target leads to resource-intensive training: for example, as of 2024, training a SoTA CM on CIFAR-10 takes one week on 8 GPUs. In this work, we propose an alternative… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  4. arXiv:2405.16225  [pdf, ps, other

    cs.LG cs.AI

    Local Causal Structure Learning in the Presence of Latent Variables

    Authors: Feng Xie, Zheng Li, Peng Wu, Yan Zeng, Chunchen Liu, Zhi Geng

    Abstract: Discovering causal relationships from observational data, particularly in the presence of latent variables, poses a challenging problem. While current local structure learning methods have proven effective and efficient when the focus lies solely on the local relationships of a target variable, they operate under the assumption of causal sufficiency. This assumption implies that all the common cau… ▽ More

    Submitted 6 June, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

  5. arXiv:2405.16130  [pdf, ps, other

    cs.LG stat.ME

    Automating the Selection of Proxy Variables of Unmeasured Confounders

    Authors: Feng Xie, Zhengming Chen, Shanshan Luo, Wang Miao, Ruichu Cai, Zhi Geng

    Abstract: Recently, interest has grown in the use of proxy variables of unobserved confounding for inferring the causal effect in the presence of unmeasured confounders from observational data. One difficulty inhibiting the practical use is finding valid proxy variables of unobserved confounding to a target causal effect of interest. These proxy variables are typically justified by background knowledge. In… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  6. arXiv:2405.15439  [pdf, other

    cs.CV cs.AI

    Text-guided 3D Human Motion Generation with Keyframe-based Parallel Skip Transformer

    Authors: Zichen Geng, Caren Han, Zeeshan Hayder, Jian Liu, Mubarak Shah, Ajmal Mian

    Abstract: Text-driven human motion generation is an emerging task in animation and humanoid robot design. Existing algorithms directly generate the full sequence which is computationally expensive and prone to errors as it does not pay special attention to key poses, a process that has been the cornerstone of animation for decades. We propose KeyMotion, that generates plausible human motion sequences corres… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  7. arXiv:2404.19620  [pdf, other

    cs.LG cs.IR stat.ML

    Be Aware of the Neighborhood Effect: Modeling Selection Bias under Interference

    Authors: Haoxuan Li, Chunyuan Zheng, Sihao Ding, Peng Wu, Zhi Geng, Fuli Feng, Xiangnan He

    Abstract: Selection bias in recommender system arises from the recommendation process of system filtering and the interactive process of user selection. Many previous studies have focused on addressing selection bias to achieve unbiased learning of the prediction model, but ignore the fact that potential outcomes for a given user-item pair may vary with the treatments assigned to other user-item pairs, name… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: ICLR 24

  8. arXiv:2404.19596  [pdf, other

    cs.IR cs.LG

    Debiased Collaborative Filtering with Kernel-Based Causal Balancing

    Authors: Haoxuan Li, Chunyuan Zheng, Yanghao Xiao, Peng Wu, Zhi Geng, Xu Chen, Peng Cui

    Abstract: Debiased collaborative filtering aims to learn an unbiased prediction model by removing different biases in observational datasets. To solve this problem, one of the simple and effective methods is based on the propensity score, which adjusts the observational sample distribution to the target one by reweighting observed instances. Ideally, propensity scores should be learned with causal balancing… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: ICLR 24 Spotlight

  9. arXiv:2404.16896  [pdf, other

    cs.GR cs.LG

    A Neural-Network-Based Approach for Loose-Fitting Clothing

    Authors: Yongxu Jin, Dalton Omens, Zhenglin Geng, Joseph Teran, Abishek Kumar, Kenji Tashiro, Ronald Fedkiw

    Abstract: Since loose-fitting clothing contains dynamic modes that have proven to be difficult to predict via neural networks, we first illustrate how to coarsely approximate these modes with a real-time numerical algorithm specifically designed to mimic the most important ballistic features of a classical numerical simulation. Although there is some flexibility in the choice of the numerical algorithm used… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  10. arXiv:2404.11443  [pdf

    cs.AI

    Prediction of Unmanned Surface Vessel Motion Attitude Based on CEEMDAN-PSO-SVM

    Authors: Zhuoya Geng, Jianmei Chen, Wanqiang Zhu

    Abstract: Unmanned boats, while navigating at sea, utilize active compensation systems to mitigate wave disturbances experienced by onboard instruments and equipment. However, there exists a lag in the measurement of unmanned boat attitudes, thus introducing unmanned boat motion attitude prediction to compensate for the lag in the signal acquisition process. This paper, based on the basic principles of wave… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  11. arXiv:2401.10774  [pdf, other

    cs.LG cs.CL

    Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads

    Authors: Tianle Cai, Yuhong Li, Zhengyang Geng, Hongwu Peng, Jason D. Lee, Deming Chen, Tri Dao

    Abstract: Large Language Models (LLMs) employ auto-regressive decoding that requires sequential computation, with each step reliant on the previous one's output. This creates a bottleneck as each step necessitates moving the full model parameters from High-Bandwidth Memory (HBM) to the accelerator's cache. While methods such as speculative decoding have been suggested to address this issue, their implementa… ▽ More

    Submitted 14 June, 2024; v1 submitted 19 January, 2024; originally announced January 2024.

    Comments: The code for this implementation is available at https://github.com/FasterDecoding/Medusa

  12. arXiv:2401.09516  [pdf, other

    cs.LG cs.AI math.NA

    Accelerating Data Generation for Neural Operators via Krylov Subspace Recycling

    Authors: Hong Wang, Zhongkai Hao, Jie Wang, Zijie Geng, Zhen Wang, Bin Li, Feng Wu

    Abstract: Learning neural operators for solving partial differential equations (PDEs) has attracted great attention due to its high inference efficiency. However, training such operators requires generating a substantial amount of labeled data, i.e., PDE problems together with their solutions. The data generation process is exceptionally time-consuming, as it involves solving numerous systems of linear equa… ▽ More

    Submitted 19 March, 2024; v1 submitted 17 January, 2024; originally announced January 2024.

  13. arXiv:2401.08639  [pdf, other

    cs.CV cs.LG

    One-Step Diffusion Distillation via Deep Equilibrium Models

    Authors: Zhengyang Geng, Ashwini Pokle, J. Zico Kolter

    Abstract: Diffusion models excel at producing high-quality samples but naively require hundreds of iterations, prompting multiple attempts to distill the generation process into a faster network. However, many existing approaches suffer from a variety of challenges: the process for distillation training can be complex, often requiring multiple training stages, and the resulting models perform poorly when ut… ▽ More

    Submitted 12 December, 2023; originally announced January 2024.

    Comments: NeurIPS 2023

  14. arXiv:2401.05960  [pdf, other

    cs.AI

    Machine Learning Insides OptVerse AI Solver: Design Principles and Applications

    Authors: Xijun Li, Fangzhou Zhu, Hui-Ling Zhen, Weilin Luo, Meng Lu, Yimin Huang, Zhenan Fan, Zirui Zhou, Yufei Kuang, Zhihai Wang, Zijie Geng, Yang Li, Haoyang Liu, Zhiwu An, Muming Yang, Jianshu Li, Jie Wang, Junchi Yan, Defeng Sun, Tao Zhong, Yong Zhang, Jia Zeng, Mingxuan Yuan, Jianye Hao, Jun Yao , et al. (1 additional authors not shown)

    Abstract: In an era of digital ubiquity, efficient resource management and decision-making are paramount across numerous industries. To this end, we present a comprehensive study on the integration of machine learning (ML) techniques into Huawei Cloud's OptVerse AI Solver, which aims to mitigate the scarcity of real-world mathematical programming instances, and to surpass the capabilities of traditional opt… ▽ More

    Submitted 17 January, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

  15. arXiv:2310.18605  [pdf, other

    cs.LG

    TorchDEQ: A Library for Deep Equilibrium Models

    Authors: Zhengyang Geng, J. Zico Kolter

    Abstract: Deep Equilibrium (DEQ) Models, an emerging class of implicit models that maps inputs to fixed points of neural networks, are of growing interest in the deep learning community. However, training and applying DEQ models is currently done in an ad-hoc fashion, with various techniques spread across the literature. In this work, we systematically revisit DEQs and present TorchDEQ, an out-of-the-box Py… ▽ More

    Submitted 28 October, 2023; originally announced October 2023.

  16. arXiv:2310.02807  [pdf, other

    cs.LG

    A Deep Instance Generative Framework for MILP Solvers Under Limited Data Availability

    Authors: Zijie Geng, Xijun Li, Jie Wang, Xiao Li, Yongdong Zhang, Feng Wu

    Abstract: In the past few years, there has been an explosive surge in the use of machine learning (ML) techniques to address combinatorial optimization (CO) problems, especially mixed-integer linear programs (MILPs). Despite the achievements, the limited availability of real-world instances often leads to sub-optimal decisions and biased solver assessments, which motivates a suite of synthetic MILP instance… ▽ More

    Submitted 11 March, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

  17. arXiv:2309.03895  [pdf, other

    cs.CV

    InstructDiffusion: A Generalist Modeling Interface for Vision Tasks

    Authors: Zigang Geng, Binxin Yang, Tiankai Hang, Chen Li, Shuyang Gu, Ting Zhang, Jianmin Bao, Zheng Zhang, Han Hu, Dong Chen, Baining Guo

    Abstract: We present InstructDiffusion, a unifying and generic framework for aligning computer vision tasks with human instructions. Unlike existing approaches that integrate prior knowledge and pre-define the output space (e.g., categories and coordinates) for each vision task, we cast diverse vision tasks into a human-intuitive image-manipulating process whose output space is a flexible and interactive pi… ▽ More

    Submitted 7 September, 2023; originally announced September 2023.

  18. arXiv:2308.06718  [pdf, other

    cs.LG cs.AI stat.ME

    Generalized Independent Noise Condition for Estimating Causal Structure with Latent Variables

    Authors: Feng Xie, Biwei Huang, Zhengming Chen, Ruichu Cai, Clark Glymour, Zhi Geng, Kun Zhang

    Abstract: We investigate the task of learning causal structure in the presence of latent variables, including locating latent variables and determining their quantity, and identifying causal relationships among both latent and observed variables. To this end, we propose a Generalized Independent Noise (GIN) condition for linear non-Gaussian acyclic causal models that incorporate latent variables, which esta… ▽ More

    Submitted 9 June, 2024; v1 submitted 13 August, 2023; originally announced August 2023.

  19. arXiv:2308.04409  [pdf, other

    cs.CV

    V-DETR: DETR with Vertex Relative Position Encoding for 3D Object Detection

    Authors: Yichao Shen, Zigang Geng, Yuhui Yuan, Yutong Lin, Ze Liu, Chunyu Wang, Han Hu, Nanning Zheng, Baining Guo

    Abstract: We introduce a highly performant 3D object detector for point clouds using the DETR framework. The prior attempts all end up with suboptimal results because they fail to learn accurate inductive biases from the limited scale of training data. In particular, the queries often attend to points that are far away from the target objects, violating the locality principle in object detection. To address… ▽ More

    Submitted 8 August, 2023; originally announced August 2023.

  20. arXiv:2308.01742  [pdf, other

    cs.LG

    Exploiting Multi-Label Correlation in Label Distribution Learning

    Authors: Zhiqiang Kou jing wang yuheng jia xin geng

    Abstract: Label Distribution Learning (LDL) is a novel machine learning paradigm that assigns label distribution to each instance. Many LDL methods proposed to leverage label correlation in the learning process to solve the exponential-sized output space; among these, many exploited the low-rank structure of label distribution to capture label correlation. However, recent studies disclosed that label distri… ▽ More

    Submitted 3 August, 2023; originally announced August 2023.

  21. arXiv:2307.06270  [pdf

    cs.GR

    Study on Virtual Gear Hobbing Simulation and Gear Tooth Surface Accuracy

    Authors: Zhi Geng, Gang Li

    Abstract: This paper presents a digital simulation method for the hobbing process of cylindrical gears. Based on the gear generation principle, taking the professional software as the tool, the problem of virtual hobbing simulation on involute helical gears was studied, and the virtual hobbing simulation of hobbing on the whole gear was completed by using macros of CATIA V5. The validity of this method was… ▽ More

    Submitted 12 July, 2023; originally announced July 2023.

  22. arXiv:2305.05128  [pdf

    cs.LG cs.AI

    A Kriging-Random Forest Hybrid Model for Real-time Ground Property Prediction during Earth Pressure Balance Shield Tunneling

    Authors: Ziheng Geng, Chao Zhang, Yuhao Ren, Minxiang Zhu, Renpeng Chen, Hongzhan Cheng

    Abstract: A kriging-random forest hybrid model is developed for real-time ground property prediction ahead of the earth pressure balanced shield by integrating Kriging extrapolation and random forest, which can guide shield operating parameter selection thereby mitigate construction risks. The proposed KRF algorithm synergizes two types of information: prior information and real-time information. The previo… ▽ More

    Submitted 8 May, 2023; originally announced May 2023.

  23. arXiv:2303.11638  [pdf, other

    cs.CV

    Human Pose as Compositional Tokens

    Authors: Zigang Geng, Chunyu Wang, Yixuan Wei, Ze Liu, Houqiang Li, Han Hu

    Abstract: Human pose is typically represented by a coordinate vector of body joints or their heatmap embeddings. While easy for data processing, unrealistic pose estimates are admitted due to the lack of dependency modeling between the body joints. In this paper, we present a structured representation, named Pose as Compositional Tokens (PCT), to explore the joint dependency. It represents a pose by M discr… ▽ More

    Submitted 21 March, 2023; originally announced March 2023.

    Comments: Accepted by CVPR2023

  24. arXiv:2302.09601  [pdf, other

    cs.LG cs.CV

    Generalization in Visual Reinforcement Learning with the Reward Sequence Distribution

    Authors: Jie Wang, Rui Yang, Zijie Geng, Zhihao Shi, Mingxuan Ye, Qi Zhou, Shuiwang Ji, Bin Li, Yongdong Zhang, Feng Wu

    Abstract: Generalization in partially observed markov decision processes (POMDPs) is critical for successful applications of visual reinforcement learning (VRL) in real scenarios. A widely used idea is to learn task-relevant representations that encode task-relevant information of common features in POMDPs, i.e., rewards and transition dynamics. As transition dynamics in the latent state space -- which are… ▽ More

    Submitted 19 February, 2023; originally announced February 2023.

    Comments: 23 pages

  25. arXiv:2302.01129  [pdf, other

    cs.LG cs.AI

    De Novo Molecular Generation via Connection-aware Motif Mining

    Authors: Zijie Geng, Shufang Xie, Yingce Xia, Lijun Wu, Tao Qin, Jie Wang, Yongdong Zhang, Feng Wu, Tie-Yan Liu

    Abstract: De novo molecular generation is an essential task for science discovery. Recently, fragment-based deep generative models have attracted much research attention due to their flexibility in generating novel molecules based on existing molecule fragments. However, the motif vocabulary, i.e., the collection of frequent fragments, is usually built upon heuristic rules, which brings difficulties to capt… ▽ More

    Submitted 26 February, 2023; v1 submitted 2 February, 2023; originally announced February 2023.

  26. arXiv:2301.02229  [pdf, other

    cs.CV cs.AI

    All in Tokens: Unifying Output Space of Visual Tasks via Soft Token

    Authors: Jia Ning, Chen Li, Zheng Zhang, Zigang Geng, Qi Dai, Kun He, Han Hu

    Abstract: Unlike language tasks, where the output space is usually limited to a set of tokens, the output space of visual tasks is more complicated, making it difficult to build a unified visual model for various visual tasks. In this paper, we seek to unify the output space of visual tasks, so that we can also build a unified model for visual tasks. To this end, we demonstrate a single unified model that s… ▽ More

    Submitted 14 February, 2023; v1 submitted 5 January, 2023; originally announced January 2023.

  27. arXiv:2210.17071  [pdf, other

    cs.LG cs.DB

    Computing Rule-Based Explanations by Leveraging Counterfactuals

    Authors: Zixuan Geng, Maximilian Schleich, Dan Suciu

    Abstract: Sophisticated machine models are increasingly used for high-stakes decisions in everyday life. There is an urgent need to develop effective explanation techniques for such automated decisions. Rule-Based Explanations have been proposed for high-stake decisions like loan applications, because they increase the users' trust in the decision. However, rule-based explanations are very inefficient to co… ▽ More

    Submitted 31 October, 2022; originally announced October 2022.

  28. arXiv:2210.12867  [pdf, other

    cs.LG cs.CV

    Deep Equilibrium Approaches to Diffusion Models

    Authors: Ashwini Pokle, Zhengyang Geng, Zico Kolter

    Abstract: Diffusion-based generative models are extremely effective in generating high-quality images, with generated samples often surpassing the quality of those produced by other models under several metrics. One distinguishing feature of these models, however, is that they typically require long sampling chains to produce high-fidelity images. This presents a challenge not only from the lenses of sampli… ▽ More

    Submitted 23 October, 2022; originally announced October 2022.

    Comments: NeurIPS 2022

  29. EMC2A-Net: An Efficient Multibranch Cross-channel Attention Network for SAR Target Classification

    Authors: Xiang Yu, Zhe Geng, Xiaohua Huang, Qinglu Wang, Daiyin Zhu

    Abstract: In recent years, convolutional neural networks (CNNs) have shown great potential in synthetic aperture radar (SAR) target recognition. SAR images have a strong sense of granularity and have different scales of texture features, such as speckle noise, target dominant scatterers and target contours, which are rarely considered in the traditional CNN model. This paper proposed two residual blocks, na… ▽ More

    Submitted 3 August, 2022; originally announced August 2022.

    Comments: 15 pages, 9 figures, Submitted to IEEE Transactions on Geoscience and Remote Sensing, 2022

  30. arXiv:2207.06095  [pdf, other

    cs.CV

    Eliminating Gradient Conflict in Reference-based Line-Art Colorization

    Authors: Zekun Li, Zhengyang Geng, Zhao Kang, Wenyu Chen, Yibo Yang

    Abstract: Reference-based line-art colorization is a challenging task in computer vision. The color, texture, and shading are rendered based on an abstract sketch, which heavily relies on the precise long-range dependency modeling between the sketch and reference. Popular techniques to bridge the cross-modal information and model the long-range dependency employ the attention mechanism. However, in the cont… ▽ More

    Submitted 20 July, 2022; v1 submitted 13 July, 2022; originally announced July 2022.

    Comments: Accepted by ECCV2022

  31. arXiv:2206.13383  [pdf

    cs.CV

    Mushroom image recognition and distance generation based on attention-mechanism model and genetic information

    Authors: Wenbin Liao, Jiewen Xiao, Chengbo Zhao, Yonggong Han, ZhiJie Geng, Jianxin Wang, Yihua Yang

    Abstract: The species identification of Macrofungi, i.e. mushrooms, has always been a challenging task. There are still a large number of poisonous mushrooms that have not been found, which poses a risk to people's life. However, the traditional identification method requires a large number of experts with knowledge in the field of taxonomy for manual identification, it is not only inefficient but also cons… ▽ More

    Submitted 27 June, 2022; originally announced June 2022.

  32. arXiv:2205.13543  [pdf, other

    cs.CV cs.AI cs.LG

    Revealing the Dark Secrets of Masked Image Modeling

    Authors: Zhenda Xie, Zigang Geng, Jingcheng Hu, Zheng Zhang, Han Hu, Yue Cao

    Abstract: Masked image modeling (MIM) as pre-training is shown to be effective for numerous vision downstream tasks, but how and where MIM works remain unclear. In this paper, we compare MIM with the long-dominant supervised pre-trained models from two perspectives, the visualizations and the experiments, to uncover their key representational differences. From the visualizations, we find that MIM brings loc… ▽ More

    Submitted 27 May, 2022; v1 submitted 26 May, 2022; originally announced May 2022.

  33. arXiv:2205.10218  [pdf, other

    cs.LG cs.AI cs.CV

    Learning Task-relevant Representations for Generalization via Characteristic Functions of Reward Sequence Distributions

    Authors: Rui Yang, Jie Wang, Zijie Geng, Mingxuan Ye, Shuiwang Ji, Bin Li, Feng Wu

    Abstract: Generalization across different environments with the same tasks is critical for successful applications of visual reinforcement learning (RL) in real scenarios. However, visual distractions -- which are common in real scenes -- from high-dimensional observations can be hurtful to the learned representations in visual RL, thus degrading the performance of generalization. To tackle this problem, we… ▽ More

    Submitted 30 June, 2022; v1 submitted 20 May, 2022; originally announced May 2022.

    Comments: Accepted to KDD 2022

  34. arXiv:2204.08442  [pdf, other

    cs.CV cs.AI cs.LG

    Deep Equilibrium Optical Flow Estimation

    Authors: Shaojie Bai, Zhengyang Geng, Yash Savani, J. Zico Kolter

    Abstract: Many recent state-of-the-art (SOTA) optical flow models use finite-step recurrent update operations to emulate traditional algorithms by encouraging iterative refinements toward a stable flow estimation. However, these RNNs impose large computation and memory overheads, and are not directly trained to model such stable estimation. They can converge poorly and thereby suffer from performance degrad… ▽ More

    Submitted 18 April, 2022; originally announced April 2022.

    Comments: CVPR 2022

  35. arXiv:2203.14186  [pdf, other

    cs.CV

    RSTT: Real-time Spatial Temporal Transformer for Space-Time Video Super-Resolution

    Authors: Zhicheng Geng, Luming Liang, Tianyu Ding, Ilya Zharkov

    Abstract: Space-time video super-resolution (STVSR) is the task of interpolating videos with both Low Frame Rate (LFR) and Low Resolution (LR) to produce High-Frame-Rate (HFR) and also High-Resolution (HR) counterparts. The existing methods based on Convolutional Neural Network~(CNN) succeed in achieving visually satisfied results while suffer from slow inference speed due to their heavy architectures. We p… ▽ More

    Submitted 26 March, 2022; originally announced March 2022.

  36. arXiv:2203.01670  [pdf, other

    cs.CL

    A Simple Hash-Based Early Exiting Approach For Language Understanding and Generation

    Authors: Tianxiang Sun, Xiangyang Liu, Wei Zhu, Zhichao Geng, Lingling Wu, Yilong He, Yuan Ni, Guotong Xie, Xuanjing Huang, Xipeng Qiu

    Abstract: Early exiting allows instances to exit at different layers according to the estimation of difficulty. Previous works usually adopt heuristic metrics such as the entropy of internal outputs to measure instance difficulty, which suffers from generalization and threshold-tuning. In contrast, learning to exit, or learning to predict instance difficulty is a more appealing way. Though some effort has b… ▽ More

    Submitted 3 March, 2022; originally announced March 2022.

    Comments: Accepted to Findings of ACL 2022

  37. arXiv:2202.09022  [pdf, other

    cs.CL cs.AI cs.IR

    TURNER: The Uncertainty-based Retrieval Framework for Chinese NER

    Authors: Zhichao Geng, Hang Yan, Zhangyue Yin, Chenxin An, Xipeng Qiu

    Abstract: Chinese NER is a difficult undertaking due to the ambiguity of Chinese characters and the absence of word boundaries. Previous work on Chinese NER focus on lexicon-based methods to introduce boundary information and reduce out-of-vocabulary (OOV) cases during prediction. However, it is expensive to obtain and dynamically maintain high-quality lexicons in specific domains, which motivates us to uti… ▽ More

    Submitted 18 February, 2022; originally announced February 2022.

  38. arXiv:2201.10122  [pdf, other

    cs.LG cs.GR

    Analytically Integratable Zero-restlength Springs for Capturing Dynamic Modes unrepresented by Quasistatic Neural Networks

    Authors: Yongxu Jin, Yushan Han, Zhenglin Geng, Joseph Teran, Ronald Fedkiw

    Abstract: We present a novel paradigm for modeling certain types of dynamic simulation in real-time with the aid of neural networks. In order to significantly reduce the requirements on data (especially time-dependent data), as well as decrease generalization error, our approach utilizes a data-driven neural network only to capture quasistatic information (instead of dynamic or time-dependent information).… ▽ More

    Submitted 25 January, 2022; originally announced January 2022.

  39. arXiv:2111.05177  [pdf, other

    cs.LG

    On Training Implicit Models

    Authors: Zhengyang Geng, Xin-Yu Zhang, Shaojie Bai, Yisen Wang, Zhouchen Lin

    Abstract: This paper focuses on training implicit models of infinite layers. Specifically, previous works employ implicit differentiation and solve the exact gradient for the backward propagation. However, is it necessary to compute such an exact but expensive gradient for training? In this work, we propose a novel gradient estimate for implicit models, named phantom gradient, that 1) forgoes the costly com… ▽ More

    Submitted 12 January, 2022; v1 submitted 9 November, 2021; originally announced November 2021.

    Comments: 24 pages, 4 figures, in The 35th Conference on Neural Information Processing Systems (NeurIPS 2021)

  40. arXiv:2110.15348  [pdf, other

    cs.LG cs.CV

    Residual Relaxation for Multi-view Representation Learning

    Authors: Yifei Wang, Zhengyang Geng, Feng Jiang, Chuming Li, Yisen Wang, Jiansheng Yang, Zhouchen Lin

    Abstract: Multi-view methods learn representations by aligning multiple views of the same image and their performance largely depends on the choice of data augmentation. In this paper, we notice that some other useful augmentations, such as image rotation, are harmful for multi-view methods because they cause a semantic shift that is too large to be aligned well. This observation motivates us to relax the e… ▽ More

    Submitted 28 October, 2021; originally announced October 2021.

    Comments: NeurIPS 2021

  41. arXiv:2109.07943  [pdf, other

    cs.CL

    RetrievalSum: A Retrieval Enhanced Framework for Abstractive Summarization

    Authors: Chenxin An, Ming Zhong, Zhichao Geng, Jianqiang Yang, Xipeng Qiu

    Abstract: Existing summarization systems mostly generate summaries purely relying on the content of the source document. However, even for humans, we usually need some references or exemplars to help us fully understand the source document and write summaries in a particular format. But how to find the high-quality exemplars and incorporate them into summarization systems is still challenging and worth expl… ▽ More

    Submitted 13 December, 2021; v1 submitted 16 September, 2021; originally announced September 2021.

  42. CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation

    Authors: Yunfan Shao, Zhichao Geng, Yitao Liu, Junqi Dai, Hang Yan, Fei Yang, Li Zhe, Hujun Bao, Xipeng Qiu

    Abstract: In this paper, we take the advantage of previous pre-trained models (PTMs) and propose a novel Chinese Pre-trained Unbalanced Transformer (CPT). Different from previous Chinese PTMs, CPT is designed to utilize the shared knowledge between natural language understanding (NLU) and natural language generation (NLG) to boost the performance. CPT consists of three parts: a shared encoder, an understand… ▽ More

    Submitted 18 July, 2022; v1 submitted 13 September, 2021; originally announced September 2021.

    Comments: Code is available at https://github.com/fastnlp/CPT

  43. arXiv:2109.04553  [pdf, other

    cs.CV cs.LG

    Is Attention Better Than Matrix Decomposition?

    Authors: Zhengyang Geng, Meng-Hao Guo, Hongxu Chen, Xia Li, Ke Wei, Zhouchen Lin

    Abstract: As an essential ingredient of modern deep learning, attention mechanism, especially self-attention, plays a vital role in the global correlation discovery. However, is hand-crafted attention irreplaceable when modeling the global context? Our intriguing finding is that self-attention is not better than the matrix decomposition (MD) model developed 20 years ago regarding the performance and computa… ▽ More

    Submitted 28 December, 2021; v1 submitted 9 September, 2021; originally announced September 2021.

    Comments: ICLR 2021

  44. arXiv:2104.02300  [pdf, other

    cs.CV

    Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression

    Authors: Zigang Geng, Ke Sun, Bin Xiao, Zhaoxiang Zhang, Jingdong Wang

    Abstract: In this paper, we are interested in the bottom-up paradigm of estimating human poses from an image. We study the dense keypoint regression framework that is previously inferior to the keypoint detection and grouping framework. Our motivation is that regressing keypoint positions accurately needs to learn representations that focus on the keypoint regions. We present a simple yet effective approa… ▽ More

    Submitted 6 April, 2021; originally announced April 2021.

    Comments: Accepted by CVPR2021. arXiv admin note: text overlap with arXiv:2006.15480

  45. A Local Method for Identifying Causal Relations under Markov Equivalence

    Authors: Zhuangyan Fang, Yue Liu, Zhi Geng, Shengyu Zhu, Yangbo He

    Abstract: Causality is important for designing interpretable and robust methods in artificial intelligence research. We propose a local approach to identify whether a variable is a cause of a given target under the framework of causal graphical models of directed acyclic graphs (DAGs). In general, the causal relation between two variables may not be identifiable from observational data as many causal DAGs e… ▽ More

    Submitted 5 March, 2022; v1 submitted 25 February, 2021; originally announced February 2021.

  46. arXiv:2101.01292  [pdf, other

    cs.LG cs.DB

    GeCo: Quality Counterfactual Explanations in Real Time

    Authors: Maximilian Schleich, Zixuan Geng, Yihong Zhang, Dan Suciu

    Abstract: Machine learning is increasingly applied in high-stakes decision making that directly affect people's lives, and this leads to an increased demand for systems to explain their decisions. Explanations often take the form of counterfactuals, which consists of conveying to the end user what she/he needs to change in order to improve the outcome. Computing counterfactual explanations is challenging, b… ▽ More

    Submitted 18 May, 2021; v1 submitted 4 January, 2021; originally announced January 2021.

    Comments: 16 pages, 12 figures, 3 tables, 3 algorithms

  47. Probability-Density-Based Deep Learning Paradigm for the Fuzzy Design of Functional Metastructures

    Authors: Ying-Tao Luo, Peng-Qi Li, Dong-Ting Li, Yu-Gui Peng, Zhi-Guo Geng, Shu-Huan Xie, Yong Li, Andrea Alu, Jie Zhu, Xue-Feng Zhu

    Abstract: In quantum mechanics, a norm squared wave function can be interpreted as the probability density that describes the likelihood of a particle to be measured in a given position or momentum. This statistical property is at the core of the fuzzy structure of microcosmos. Recently, hybrid neural structures raised intense attention, resulting in various intelligent systems with far-reaching influence.… ▽ More

    Submitted 10 November, 2020; originally announced November 2020.

    Comments: Published in Research, an AAAS Science Partner Journal

    Journal ref: Research, vol. 2020, Article ID 8757403, 2020

  48. arXiv:2009.08633  [pdf, other

    cs.CL

    fastHan: A BERT-based Multi-Task Toolkit for Chinese NLP

    Authors: Zhichao Geng, Hang Yan, Xipeng Qiu, Xuanjing Huang

    Abstract: We present fastHan, an open-source toolkit for four basic tasks in Chinese natural language processing: Chinese word segmentation (CWS), Part-of-Speech (POS) tagging, named entity recognition (NER), and dependency parsing. The backbone of fastHan is a multi-task model based on a pruned BERT, which uses the first 8 layers in BERT. We also provide a 4-layer base model compressed from the 8-layer mod… ▽ More

    Submitted 30 May, 2021; v1 submitted 18 September, 2020; originally announced September 2020.

    Comments: ACL2021 Demo Track

  49. arXiv:2008.02604  [pdf, other

    eess.IV cs.CV

    Deep Learning Based Defect Detection for Solder Joints on Industrial X-Ray Circuit Board Images

    Authors: Qianru Zhang, Meng Zhang, Chinthaka Gamanayake, Chau Yuen, Zehao Geng, Hirunima Jayasekara, Xuewen Zhang, Chia-wei Woo, Jenny Low, Xiang Liu

    Abstract: Quality control is of vital importance during electronics production. As the methods of producing electronic circuits improve, there is an increasing chance of solder defects during assembling the printed circuit board (PCB). Many technologies have been incorporated for inspecting failed soldering, such as X-ray imaging, optical imaging, and thermal imaging. With some advanced algorithms, the new… ▽ More

    Submitted 25 March, 2021; v1 submitted 6 August, 2020; originally announced August 2020.

    Comments: Accepted by conference INDIN 2020

  50. arXiv:2006.15480  [pdf, other

    cs.CV

    Bottom-Up Human Pose Estimation by Ranking Heatmap-Guided Adaptive Keypoint Estimates

    Authors: Ke Sun, Zigang Geng, Depu Meng, Bin Xiao, Dong Liu, Zhaoxiang Zhang, Jingdong Wang

    Abstract: The typical bottom-up human pose estimation framework includes two stages, keypoint detection and grouping. Most existing works focus on developing grouping algorithms, e.g., associative embedding, and pixel-wise keypoint regression that we adopt in our approach. We present several schemes that are rarely or unthoroughly studied before for improving keypoint detection and grouping (keypoint regres… ▽ More

    Submitted 27 June, 2020; originally announced June 2020.