Skip to main content

Showing 1–50 of 198 results for author: Le, D

  1. arXiv:2407.01527  [pdf, other

    cs.CL

    KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches

    Authors: Jiayi Yuan, Hongyi Liu, Shaochen, Zhong, Yu-Neng Chuang, Songchen Li, Guanchu Wang, Duy Le, Hongye Jin, Vipin Chaudhary, Zhaozhuo Xu, Zirui Liu, Xia Hu

    Abstract: Long context capability is a crucial competency for large language models (LLMs) as it mitigates the human struggle to digest long-form texts. This capability enables complex task-solving scenarios such as book summarization, code assistance, and many more tasks that are traditionally manpower-intensive. However, transformer-based LLMs face significant challenges with long context input due to the… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  2. arXiv:2406.19806  [pdf, other

    cs.DC cs.ET cs.NI

    Prediction based computation offloading and resource allocation for multi-access ISAC enabled IoT system

    Authors: Duc-Thuan Le

    Abstract: In the new era of the Internet of Things (IoT), tasks are now being migrated to edge sites closer to data generators. Mobile devices inherently encounter limitations in terms of energy and computational processing capabilities. In high mobility paradigm, ISAC provides a promising foundation for integrating deployment management within dynamic spatial settings. We are interested in applying predict… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    MSC Class: 60-08 ACM Class: C.2.1

  3. arXiv:2406.15633  [pdf, other

    cs.SE

    Good things come in three: Generating SO Post Titles with Pre-Trained Models, Self Improvement and Post Ranking

    Authors: Duc Anh Le, Anh M. T. Bui, Phuong T. Nguyen, Davide Di Ruscio

    Abstract: Stack Overflow is a prominent Q and A forum, supporting developers in seeking suitable resources on programming-related matters. Having high-quality question titles is an effective means to attract developers' attention. Unfortunately, this is often underestimated, leaving room for improvement. Research has been conducted, predominantly leveraging pre-trained models to generate titles from code sn… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: The paper has been per-reviewed and accepted for publication to the International Symposium on Empirical Software Engineering and Measurement (ESEM 2024)

  4. arXiv:2406.10223  [pdf, other

    cs.LG cs.SD eess.AS

    Diffusion Synthesizer for Efficient Multilingual Speech to Speech Translation

    Authors: Nameer Hirschkind, Xiao Yu, Mahesh Kumar Nandwana, Joseph Liu, Eloi DuBois, Dao Le, Nicolas Thiebaut, Colin Sinclair, Kyle Spence, Charles Shang, Zoe Abrams, Morgan McGuire

    Abstract: We introduce DiffuseST, a low-latency, direct speech-to-speech translation system capable of preserving the input speaker's voice zero-shot while translating from multiple source languages into English. We experiment with the synthesizer component of the architecture, comparing a Tacotron-based synthesizer to a novel diffusion-based synthesizer. We find the diffusion-based synthesizer to improve M… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Published in Interspeech 2024

  5. arXiv:2406.08953  [pdf, other

    cs.CV cs.LG

    Preserving Identity with Variational Score for General-purpose 3D Editing

    Authors: Duong H. Le, Tuan Pham, Aniruddha Kembhavi, Stephan Mandt, Wei-Chiu Ma, Jiasen Lu

    Abstract: We present Piva (Preserving Identity with Variational Score Distillation), a novel optimization-based method for editing images and 3D models based on diffusion models. Specifically, our approach is inspired by the recently proposed method for 2D image editing - Delta Denoising Score (DDS). We pinpoint the limitations in DDS for 2D and 3D editing, which causes detail loss and over-saturation. To a… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 22 pages, 14 figures

  6. arXiv:2406.07823  [pdf, other

    cs.CL cs.SD eess.AS

    PRoDeliberation: Parallel Robust Deliberation for End-to-End Spoken Language Understanding

    Authors: Trang Le, Daniel Lazar, Suyoun Kim, Shan Jiang, Duc Le, Adithya Sagar, Aleksandr Livshits, Ahmed Aly, Akshat Shrivastava

    Abstract: Spoken Language Understanding (SLU) is a critical component of voice assistants; it consists of converting speech to semantic parses for task execution. Previous works have explored end-to-end models to improve the quality and robustness of SLU models with Deliberation, however these models have remained autoregressive, resulting in higher latencies. In this work we introduce PRoDeliberation, a no… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  7. arXiv:2406.03713  [pdf

    cs.RO

    Gait-Adaptive Navigation and Human Searching in field with Cyborg Insect

    Authors: Phuoc Thanh Tran-Ngoc, Huu Duoc Nguyen, Duc Long Le, Rui Li, Bing Sheng Chong, Hirotaka Sato

    Abstract: This study focuses on improving the ability of cyborg insects to navigate autonomously during search and rescue missions in outdoor environments. We propose an algorithm that leverages data from an IMU to calculate orientation and position based on the insect's walking gait. These computed factors serve as essential feedback channels across 3 phases of our exploration. Our method functions without… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 35 pages, 9 figures

  8. arXiv:2406.02624  [pdf, other

    cs.CR cs.SE

    Take a Step Further: Understanding Page Spray in Linux Kernel Exploitation

    Authors: Ziyi Guo, Dang K Le, Zhenpeng Lin, Kyle Zeng, Ruoyu Wang, Tiffany Bao, Yan Shoshitaishvili, Adam Doupé, Xinyu Xing

    Abstract: Recently, a novel method known as Page Spray emerges, focusing on page-level exploitation for kernel vulnerabilities. Despite the advantages it offers in terms of exploitability, stability, and compatibility, comprehensive research on Page Spray remains scarce. Questions regarding its root causes, exploitation model, comparative benefits over other exploitation techniques, and possible mitigation… ▽ More

    Submitted 6 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

  9. arXiv:2405.19612  [pdf, other

    cs.IR

    Keyword-driven Retrieval-Augmented Large Language Models for Cold-start User Recommendations

    Authors: Hai-Dang Kieu, Minh Duc Nguyen, Thanh-Son Nguyen, Dung D. Le

    Abstract: Recent advancements in Large Language Models (LLMs) have shown significant potential in enhancing recommender systems. However, addressing the cold-start recommendation problem, where users lack historical data, remains a considerable challenge. In this paper, we introduce KALM4Rec (Keyword-driven Retrieval-Augmented Large Language Models for Cold-start User Recommendations), a novel framework spe… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: 10 pages, 10 figures, 4 tables

  10. arXiv:2405.00681  [pdf, other

    eess.SP cs.IT cs.NI eess.SY

    Delay and Overhead Efficient Transmission Scheduling for Federated Learning in UAV Swarms

    Authors: Duc N. M. Hoang, Vu Tuan Truong, Hung Duy Le, Long Bao Le

    Abstract: This paper studies the wireless scheduling design to coordinate the transmissions of (local) model parameters of federated learning (FL) for a swarm of unmanned aerial vehicles (UAVs). The overall goal of the proposed design is to realize the FL training and aggregation processes with a central aggregator exploiting the sensory data collected by the UAVs but it considers the multi-hop wireless net… ▽ More

    Submitted 22 February, 2024; originally announced May 2024.

    Comments: accepted to WCNC'24

  11. arXiv:2404.12450  [pdf, other

    cs.CV cs.AI cs.LG

    Enhancing AI Diagnostics: Autonomous Lesion Masking via Semi-Supervised Deep Learning

    Authors: Ting-Ruen Wei, Michele Hell, Dang Bich Thuy Le, Aren Vierra, Ran Pang, Mahesh Patel, Young Kang, Yuling Yan

    Abstract: This study presents an unsupervised domain adaptation method aimed at autonomously generating image masks outlining regions of interest (ROIs) for differentiating breast lesions in breast ultrasound (US) imaging. Our semi-supervised learning approach utilizes a primitive model trained on a small public breast US dataset with true annotations. This model is then iteratively refined for the domain a… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  12. arXiv:2404.04629  [pdf, other

    cs.CV

    DifFUSER: Diffusion Model for Robust Multi-Sensor Fusion in 3D Object Detection and BEV Segmentation

    Authors: Duy-Tho Le, Hengcan Shi, Jianfei Cai, Hamid Rezatofighi

    Abstract: Diffusion models have recently gained prominence as powerful deep generative models, demonstrating unmatched performance across various domains. However, their potential in multi-sensor fusion remains largely unexplored. In this work, we introduce DifFUSER, a novel approach that leverages diffusion models for multi-modal fusion in 3D object detection and BEV map segmentation. Benefiting from the i… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

    Comments: 23 pages

  13. arXiv:2404.01686  [pdf, other

    cs.CV

    JRDB-PanoTrack: An Open-world Panoptic Segmentation and Tracking Robotic Dataset in Crowded Human Environments

    Authors: Duy-Tho Le, Chenhui Gou, Stavya Datta, Hengcan Shi, Ian Reid, Jianfei Cai, Hamid Rezatofighi

    Abstract: Autonomous robot systems have attracted increasing research attention in recent years, where environment understanding is a crucial step for robot navigation, human-robot interaction, and decision. Real-world robot systems usually collect visual data from multiple sensors and are required to recognize numerous objects and their movements in complex human-crowded settings. Traditional benchmarks, w… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: CVPR 2024

  14. arXiv:2404.00006  [pdf, ps, other

    cs.CC

    A Critique of Chen's "The 2-MAXSAT Problem Can Be Solved in Polynomial Time"

    Authors: Tran Duy Anh Le, Michael P. Reidy, Eliot J. Smith

    Abstract: In this paper, we examine Yangjun Chen's technical report titled ``The 2-MAXSAT Problem Can Be Solved in Polynomial Time'' [Che23], which revises and expands upon their conference paper of the same name [Che22]. Chen's paper purports to build a polynomial-time algorithm for the ${\rm NP}$-complete problem 2-MAXSAT by converting a 2-CNF formula into a graph that is then searched. We show through mu… ▽ More

    Submitted 21 February, 2024; originally announced April 2024.

  15. arXiv:2403.19161  [pdf, other

    cs.CL

    Improving Vietnamese-English Medical Machine Translation

    Authors: Nhu Vo, Dat Quoc Nguyen, Dung D. Le, Massimo Piccardi, Wray Buntine

    Abstract: Machine translation for Vietnamese-English in the medical domain is still an under-explored research area. In this paper, we introduce MedEV -- a high-quality Vietnamese-English parallel dataset constructed specifically for the medical domain, comprising approximately 360K sentence pairs. We conduct extensive experiments comparing Google Translate, ChatGPT (gpt-3.5-turbo), state-of-the-art Vietnam… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: To appear in Proceedings of LREC-COLING 2024

  16. arXiv:2403.17392  [pdf, other

    cs.RO eess.SY nlin.AO

    Natural-artificial hybrid swarm: Cyborg-insect group navigation in unknown obstructed soft terrain

    Authors: Yang Bai, Phuoc Thanh Tran Ngoc, Huu Duoc Nguyen, Duc Long Le, Quang Huy Ha, Kazuki Kai, Yu Xiang See To, Yaosheng Deng, Jie Song, Naoki Wakamiya, Hirotaka Sato, Masaki Ogura

    Abstract: Navigating multi-robot systems in complex terrains has always been a challenging task. This is due to the inherent limitations of traditional robots in collision avoidance, adaptation to unknown environments, and sustained energy efficiency. In order to overcome these limitations, this research proposes a solution by integrating living insects with miniature electronic controllers to enable roboti… ▽ More

    Submitted 27 March, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

  17. arXiv:2403.16958  [pdf, other

    cs.CV

    TwinLiteNetPlus: A Stronger Model for Real-time Drivable Area and Lane Segmentation

    Authors: Quang-Huy Che, Duc-Tri Le, Minh-Quan Pham, Vinh-Tiep Nguyen, Duc-Khai Lam

    Abstract: Semantic segmentation is crucial for autonomous driving, particularly for Drivable Area and Lane Segmentation, ensuring safety and navigation. To address the high computational costs of current state-of-the-art (SOTA) models, this paper introduces TwinLiteNetPlus (TwinLiteNet$^+$), a model adept at balancing efficiency and accuracy. TwinLiteNet$^+$ incorporates standard and depth-wise separable di… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  18. arXiv:2403.08947  [pdf, other

    eess.IV cs.CV

    Robust COVID-19 Detection in CT Images with CLIP

    Authors: Li Lin, Yamini Sri Krubha, Zhenhuan Yang, Cheng Ren, Thuc Duy Le, Irene Amerini, Xin Wang, Shu Hu

    Abstract: In the realm of medical imaging, particularly for COVID-19 detection, deep learning models face substantial challenges such as the necessity for extensive computational resources, the paucity of well-annotated datasets, and a significant amount of unlabeled data. In this work, we introduce the first lightweight detector designed to overcome these obstacles, leveraging a frozen CLIP image encoder a… ▽ More

    Submitted 14 March, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

  19. arXiv:2403.02715  [pdf, other

    cs.CL cs.AI

    Crossing Linguistic Horizons: Finetuning and Comprehensive Evaluation of Vietnamese Large Language Models

    Authors: Sang T. Truong, Duc Q. Nguyen, Toan Nguyen, Dong D. Le, Nhi N. Truong, Tho Quan, Sanmi Koyejo

    Abstract: Recent advancements in large language models (LLMs) have underscored their importance in the evolution of artificial intelligence. However, despite extensive pretraining on multilingual datasets, available open-sourced LLMs exhibit limited effectiveness in processing Vietnamese. The challenge is exacerbated by the absence of systematic benchmark datasets and metrics tailored for Vietnamese LLM eva… ▽ More

    Submitted 26 May, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

    Comments: 51 pages

    MSC Class: 68T50

  20. Improving Visual Perception of a Social Robot for Controlled and In-the-wild Human-robot Interaction

    Authors: Wangjie Zhong, Leimin Tian, Duy Tho Le, Hamid Rezatofighi

    Abstract: Social robots often rely on visual perception to understand their users and the environment. Recent advancements in data-driven approaches for computer vision have demonstrated great potentials for applying deep-learning models to enhance a social robot's visual perception. However, the high computational demands of deep-learning methods, as opposed to the more resource-efficient shallow-learning… ▽ More

    Submitted 5 March, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

    Comments: accepted to HRI 2024 (LBR track)

  21. arXiv:2402.17467  [pdf, other

    cs.IR cs.AI cs.SD eess.AS

    Natural Language Processing Methods for Symbolic Music Generation and Information Retrieval: a Survey

    Authors: Dinh-Viet-Toan Le, Louis Bigo, Mikaela Keller, Dorien Herremans

    Abstract: Several adaptations of Transformers models have been developed in various domains since its breakthrough in Natural Language Processing (NLP). This trend has spread into the field of Music Information Retrieval (MIR), including studies processing music data. However, the practice of leveraging NLP tools for symbolic music data is not novel in MIR. Music has been frequently compared to language, as… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: 36 pages, 5 figures, 4 tables

  22. arXiv:2402.17269  [pdf, other

    cs.LG

    Curriculum Learning Meets Directed Acyclic Graph for Multimodal Emotion Recognition

    Authors: Cam-Van Thi Nguyen, Cao-Bach Nguyen, Quang-Thuy Ha, Duc-Trong Le

    Abstract: Emotion recognition in conversation (ERC) is a crucial task in natural language processing and affective computing. This paper proposes MultiDAG+CL, a novel approach for Multimodal Emotion Recognition in Conversation (ERC) that employs Directed Acyclic Graph (DAG) to integrate textual, acoustic, and visual features within a unified framework. The model is enhanced by Curriculum Learning (CL) to ad… ▽ More

    Submitted 8 March, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

    Comments: Accepted by LREC-COLING 2024

  23. arXiv:2402.14305  [pdf, other

    cs.IR cs.LG

    Towards Efficient Pareto-optimal Utility-Fairness between Groups in Repeated Rankings

    Authors: Phuong Dinh Mai, Duc-Trong Le, Tuan-Anh Hoang, Dung D. Le

    Abstract: In this paper, we tackle the problem of computing a sequence of rankings with the guarantee of the Pareto-optimal balance between (1) maximizing the utility of the consumers and (2) minimizing unfairness between producers of the items. Such a multi-objective optimization problem is typically solved using a combination of a scalarization method and linear programming on bi-stochastic matrices, repr… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

  24. arXiv:2402.11469  [pdf, other

    cs.LG cs.CL cs.CR

    A Curious Case of Searching for the Correlation between Training Data and Adversarial Robustness of Transformer Textual Models

    Authors: Cuong Dang, Dung D. Le, Thai Le

    Abstract: Existing works have shown that fine-tuned textual transformer models achieve state-of-the-art prediction performances but are also vulnerable to adversarial text perturbations. Traditional adversarial evaluation is often done \textit{only after} fine-tuning the models and ignoring the training data. In this paper, we want to prove that there is also a strong correlation between training data and m… ▽ More

    Submitted 1 July, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

    Comments: Accepted to ACL Findings 2024

  25. arXiv:2402.03292  [pdf, other

    cs.LG cs.CV

    Zero-shot Object-Level OOD Detection with Context-Aware Inpainting

    Authors: Quang-Huy Nguyen, Jin Peng Zhou, Zhenzhen Liu, Khanh-Huyen Bui, Kilian Q. Weinberger, Dung D. Le

    Abstract: Machine learning algorithms are increasingly provided as black-box cloud services or pre-trained models, without access to their training data. This motivates the problem of zero-shot out-of-distribution (OOD) detection. Concretely, we aim to detect OOD objects that do not belong to the classifier's label set but are erroneously classified as in-distribution (ID) objects. Our approach, RONIN, uses… ▽ More

    Submitted 6 February, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

  26. arXiv:2402.03131  [pdf, other

    cs.CL cs.LG

    Constrained Decoding for Cross-lingual Label Projection

    Authors: Duong Minh Le, Yang Chen, Alan Ritter, Wei Xu

    Abstract: Zero-shot cross-lingual transfer utilizing multilingual LLMs has become a popular learning paradigm for low-resource languages with no labeled training data. However, for NLP tasks that involve fine-grained predictions on words and phrases, the performance of zero-shot cross-lingual transfer learning lags far behind supervised fine-tuning methods. Therefore, it is common to exploit translation and… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

    Comments: Accepted at ICLR 2024

  27. arXiv:2401.07278  [pdf, other

    cs.CV cs.AI

    Semi-Supervised Semantic Segmentation using Redesigned Self-Training for White Blood Cells

    Authors: Vinh Quoc Luu, Duy Khanh Le, Huy Thanh Nguyen, Minh Thanh Nguyen, Thinh Tien Nguyen, Vinh Quang Dinh

    Abstract: Artificial Intelligence (AI) in healthcare, especially in white blood cell cancer diagnosis, is hindered by two primary challenges: the lack of large-scale labeled datasets for white blood cell (WBC) segmentation and outdated segmentation methods. These challenges inhibit the development of more accurate and modern techniques to diagnose cancer relating to white blood cells. To address the first c… ▽ More

    Submitted 23 February, 2024; v1 submitted 14 January, 2024; originally announced January 2024.

  28. arXiv:2401.03748  [pdf, other

    cs.LG cs.CR cs.DC cs.IR

    Towards Efficient Communication and Secure Federated Recommendation System via Low-rank Training

    Authors: Ngoc-Hieu Nguyen, Tuan-Anh Nguyen, Tuan Nguyen, Vu Tien Hoang, Dung D. Le, Kok-Seng Wong

    Abstract: Federated Recommendation (FedRec) systems have emerged as a solution to safeguard users' data in response to growing regulatory concerns. However, one of the major challenges in these systems lies in the communication costs that arise from the need to transmit neural network models between user devices and a central server. Prior approaches to these challenges often lead to issues such as computat… ▽ More

    Submitted 28 February, 2024; v1 submitted 8 January, 2024; originally announced January 2024.

    Comments: 12 pages, 6 figures, 4 tables

  29. arXiv:2312.10518  [pdf, other

    cs.SD cs.AI eess.AS

    Seq2seq for Automatic Paraphasia Detection in Aphasic Speech

    Authors: Matthew Perez, Duc Le, Amrit Romana, Elise Jones, Keli Licata, Emily Mower Provost

    Abstract: Paraphasias are speech errors that are often characteristic of aphasia and they represent an important signal in assessing disease severity and subtype. Traditionally, clinicians manually identify paraphasias by transcribing and analyzing speech-language samples, which can be a time-consuming and burdensome process. Identifying paraphasias automatically can greatly help clinicians with the transcr… ▽ More

    Submitted 16 December, 2023; originally announced December 2023.

  30. arXiv:2312.10202  [pdf, other

    cs.CL

    Low-resource classification of mobility functioning information in clinical sentences using large language models

    Authors: Tuan Dung Le, Thanh Duong, Thanh Thieu

    Abstract: Objective: Function is increasingly recognized as an important indicator of whole-person health. This study evaluates the ability of publicly available large language models (LLMs) to accurately identify the presence of functioning information from clinical notes. We explore various strategies to improve the performance on this task. Materials and Methods: We collect a balanced binary classificati… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

  31. arXiv:2312.08723  [pdf, other

    cs.SD cs.LG eess.AS

    StemGen: A music generation model that listens

    Authors: Julian D. Parker, Janne Spijkervet, Katerina Kosta, Furkan Yesiler, Boris Kuznetsov, Ju-Chiang Wang, Matt Avent, Jitong Chen, Duc Le

    Abstract: End-to-end generation of musical audio using deep learning techniques has seen an explosion of activity recently. However, most models concentrate on generating fully mixed music in response to abstract conditioning information. In this work, we present an alternative paradigm for producing music generation models that can listen and respond to musical context. We describe how such a model can be… ▽ More

    Submitted 16 January, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

    Comments: Accepted for publication at ICASSP 2024

  32. arXiv:2312.07175  [pdf, other

    cs.LG cs.AI stat.ME

    Instrumental Variable Estimation for Causal Inference in Longitudinal Data with Time-Dependent Latent Confounders

    Authors: Debo Cheng, Ziqi Xu, Jiuyong Li, Lin Liu, Jixue Liu, Wentao Gao, Thuc Duy Le

    Abstract: Causal inference from longitudinal observational data is a challenging problem due to the difficulty in correctly identifying the time-dependent confounders, especially in the presence of latent time-dependent confounders. Instrumental variable (IV) is a powerful tool for addressing the latent confounders issue, but the traditional IV technique cannot deal with latent time-dependent confounders in… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

    Comments: 13 pages, 7 figures and 3 tables

  33. arXiv:2312.06279  [pdf, other

    cs.LG cs.AI

    Regional Correlation Aided Mobile Traffic Prediction with Spatiotemporal Deep Learning

    Authors: JeongJun Park, Lusungu J. Mwasinga, Huigyu Yang, Syed M. Raza, Duc-Tai Le, Moonseong Kim, Min Young Chung, Hyunseung Choo

    Abstract: Mobile traffic data in urban regions shows differentiated patterns during different hours of the day. The exploitation of these patterns enables highly accurate mobile traffic prediction for proactive network management. However, recent Deep Learning (DL) driven studies have only exploited spatiotemporal features and have ignored the geographical correlations, causing high complexity and erroneous… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

    Comments: 4 pages, 5 figures, 1 table. This paper is already accepted on IEEE Consumer Communications & Networking Conference(CCNC) 2024

  34. arXiv:2312.04395  [pdf, ps, other

    cs.CC

    On Czerwinski's "${\rm P} \neq {\rm NP}$ relative to a ${\rm P}$-complete oracle"

    Authors: Michael C. Chavrimootoo, Tran Duy Anh Le, Michael P. Reidy, Eliot J. Smith

    Abstract: In this paper, we take a closer look at Czerwinski's "${\rm P}\neq{\rm NP}$ relative to a ${\rm P}$-complete oracle" [Cze23]. There are (uncountably) infinitely-many relativized worlds where ${\rm P}$ and ${\rm NP}$ differ, and it is well-known that for any ${\rm P}$-complete problem $A$, ${\rm P}^A \neq {\rm NP}^A \iff {\rm P}\neq {\rm NP}$. The paper defines two sets ${\rm D}_{\rm P}$ and… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

  35. arXiv:2311.15297  [pdf, other

    cs.LG math.OC

    Controllable Expensive Multi-objective Learning with Warm-starting Bayesian Optimization

    Authors: Quang-Huy Nguyen, Long P. Hoang, Hoang V. Viet, Dung D. Le

    Abstract: Pareto Set Learning (PSL) is a promising approach for approximating the entire Pareto front in multi-objective optimization (MOO) problems. However, existing derivative-free PSL methods are often unstable and inefficient, especially for expensive black-box MOO problems where objective function evaluations are costly. In this work, we propose to address the instability and inefficiency of existing… ▽ More

    Submitted 9 February, 2024; v1 submitted 26 November, 2023; originally announced November 2023.

  36. Conversation Understanding using Relational Temporal Graph Neural Networks with Auxiliary Cross-Modality Interaction

    Authors: Cam-Van Thi Nguyen, Anh-Tuan Mai, The-Son Le, Hai-Dang Kieu, Duc-Trong Le

    Abstract: Emotion recognition is a crucial task for human conversation understanding. It becomes more challenging with the notion of multimodal data, e.g., language, voice, and facial expressions. As a typical solution, the global- and the local context information are exploited to predict the emotional label for every single sentence, i.e., utterance, in the dialogue. Specifically, the global representatio… ▽ More

    Submitted 30 January, 2024; v1 submitted 8 November, 2023; originally announced November 2023.

    Comments: EMNLP 2023

    Journal ref: The 2023 Conference on Empirical Methods in Natural Language Processing

  37. arXiv:2311.03785  [pdf, other

    cs.CV cs.MM

    Self-MI: Efficient Multimodal Fusion via Self-Supervised Multi-Task Learning with Auxiliary Mutual Information Maximization

    Authors: Cam-Van Thi Nguyen, Ngoc-Hoa Thi Nguyen, Duc-Trong Le, Quang-Thuy Ha

    Abstract: Multimodal representation learning poses significant challenges in capturing informative and distinct features from multiple modalities. Existing methods often struggle to exploit the unique characteristics of each modality due to unified multimodal annotations. In this study, we propose Self-MI in the self-supervised learning fashion, which also leverage Contrastive Predictive Coding (CPC) as an… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

    Comments: Accepted at The 37th Pacific Asia Conference on Language, Information and Computation (PACLIC 37)

  38. arXiv:2311.03318  [pdf, other

    cs.SD cs.IR eess.AS

    A Foundation Model for Music Informatics

    Authors: Minz Won, Yun-Ning Hung, Duc Le

    Abstract: This paper investigates foundation models tailored for music informatics, a domain currently challenged by the scarcity of labeled data and generalization issues. To this end, we conduct an in-depth comparative study among various foundation model variants, examining key determinants such as model architectures, tokenization methods, temporal resolution, data, and model scalability. This research… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

    Comments: 5 pages

  39. arXiv:2311.00729  [pdf, other

    cs.CV cs.AI

    ZEETAD: Adapting Pretrained Vision-Language Model for Zero-Shot End-to-End Temporal Action Detection

    Authors: Thinh Phan, Khoa Vo, Duy Le, Gianfranco Doretto, Donald Adjeroh, Ngan Le

    Abstract: Temporal action detection (TAD) involves the localization and classification of action instances within untrimmed videos. While standard TAD follows fully supervised learning with closed-set setting on large training data, recent zero-shot TAD methods showcase the promising open-set setting by leveraging large-scale contrastive visual-language (ViL) pretrained models. However, existing zero-shot T… ▽ More

    Submitted 4 November, 2023; v1 submitted 31 October, 2023; originally announced November 2023.

  40. arXiv:2310.12074  [pdf, other

    cs.CL

    Towards Safer Operations: An Expert-involved Dataset of High-Pressure Gas Incidents for Preventing Future Failures

    Authors: Shumpei Inoue, Minh-Tien Nguyen, Hiroki Mizokuchi, Tuan-Anh D. Nguyen, Huu-Hiep Nguyen, Dung Tien Le

    Abstract: This paper introduces a new IncidentAI dataset for safety prevention. Different from prior corpora that usually contain a single task, our dataset comprises three tasks: named entity recognition, cause-effect extraction, and information retrieval. The dataset is annotated by domain experts who have at least six years of practical experience as high-pressure gas conservation managers. We validate t… ▽ More

    Submitted 23 October, 2023; v1 submitted 18 October, 2023; originally announced October 2023.

    Comments: Accepted by EMNLP 2023 (The Industry Track)

  41. arXiv:2310.01865  [pdf, other

    cs.LG cs.AI

    Conditional Instrumental Variable Regression with Representation Learning for Causal Inference

    Authors: Debo Cheng, Ziqi Xu, Jiuyong Li, Lin Liu, Jixue Liu, Thuc Duy Le

    Abstract: This paper studies the challenging problem of estimating causal effects from observational data, in the presence of unobserved confounders. The two-stage least square (TSLS) method and its variants with a standard instrumental variable (IV) are commonly used to eliminate confounding bias, including the bias caused by unobserved confounders, but they rely on the linearity assumption. Besides, the s… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.

    Comments: 17pages, 3 figures and 6 tables

  42. arXiv:2310.01353  [pdf, other

    eess.AS cs.SD

    Scaling Up Music Information Retrieval Training with Semi-Supervised Learning

    Authors: Yun-Ning Hung, Ju-Chiang Wang, Minz Won, Duc Le

    Abstract: In the era of data-driven Music Information Retrieval (MIR), the scarcity of labeled data has been one of the major concerns to the success of an MIR task. In this work, we leverage the semi-supervised teacher-student training approach to improve MIR tasks. For training, we scale up the unlabeled music data to 240k hours, which is much larger than any public MIR datasets. We iteratively create and… ▽ More

    Submitted 2 October, 2023; originally announced October 2023.

  43. arXiv:2308.11161  [pdf, other

    cs.SE

    Adversarial Attacks on Code Models with Discriminative Graph Patterns

    Authors: Thanh-Dat Nguyen, Yang Zhou, Xuan Bach D. Le, Patanamon, Thongtanunam, David Lo

    Abstract: Pre-trained language models of code are now widely used in various software engineering tasks such as code generation, code completion, vulnerability detection, etc. This, in turn, poses security and reliability risks to these models. One of the important threats is \textit{adversarial attacks}, which can lead to erroneous predictions and largely affect model performance on downstream tasks. Curre… ▽ More

    Submitted 21 August, 2023; originally announced August 2023.

  44. arXiv:2307.16834  [pdf

    cs.CV cs.AI cs.LG eess.IV

    Benchmarking Jetson Edge Devices with an End-to-end Video-based Anomaly Detection System

    Authors: Hoang Viet Pham, Thinh Gia Tran, Chuong Dinh Le, An Dinh Le, Hien Bich Vo

    Abstract: Innovative enhancement in embedded system platforms, specifically hardware accelerations, significantly influence the application of deep learning in real-world scenarios. These innovations translate human labor efforts into automated intelligent systems employed in various areas such as autonomous driving, robotics, Internet-of-Things (IoT), and numerous other impactful applications. NVIDIA's Jet… ▽ More

    Submitted 12 September, 2023; v1 submitted 28 July, 2023; originally announced July 2023.

    Comments: Accepted in Future of Information and Communication Conference (FICC) 2024

  45. arXiv:2307.12596  [pdf, other

    cs.SE

    Refining ChatGPT-Generated Code: Characterizing and Mitigating Code Quality Issues

    Authors: Yue Liu, Thanh Le-Cong, Ratnadira Widyasari, Chakkrit Tantithamthavorn, Li Li, Xuan-Bach D. Le, David Lo

    Abstract: We systematically study the quality of 4,066 ChatGPT-generated code implemented in two popular programming languages, i.e., Java and Python, for 2,033 programming tasks. The goal of this work is three folds. First, we analyze the correctness of ChatGPT on code generation tasks and uncover the factors that influence its effectiveness, including task difficulty, programming language, time that tasks… ▽ More

    Submitted 14 December, 2023; v1 submitted 24 July, 2023; originally announced July 2023.

  46. arXiv:2307.12134  [pdf, other

    cs.CL cs.SD eess.AS

    Modality Confidence Aware Training for Robust End-to-End Spoken Language Understanding

    Authors: Suyoun Kim, Akshat Shrivastava, Duc Le, Ju Lin, Ozlem Kalinli, Michael L. Seltzer

    Abstract: End-to-end (E2E) spoken language understanding (SLU) systems that generate a semantic parse from speech have become more promising recently. This approach uses a single model that utilizes audio and text representations from pre-trained speech recognition models (ASR), and outperforms traditional pipeline SLU systems in on-device streaming scenarios. However, E2E SLU systems still show weakness wh… ▽ More

    Submitted 22 July, 2023; originally announced July 2023.

    Comments: INTERSPEECH 2023

  47. arXiv:2307.04514  [pdf, other

    cs.LG cs.AI

    Improving Heterogeneous Graph Learning with Weighted Mixed-Curvature Product Manifold

    Authors: Tuc Nguyen-Van, Dung D. Le, The-Anh Ta

    Abstract: In graph representation learning, it is important that the complex geometric structure of the input graph, e.g. hidden relations among nodes, is well captured in embedding space. However, standard Euclidean embedding spaces have a limited capacity in representing graphs of varying structures. A promising candidate for the faithful embedding of data with varying structure is product manifolds of co… ▽ More

    Submitted 10 July, 2023; originally announced July 2023.

  48. arXiv:2307.01844  [pdf, other

    cs.CV

    Advancing Wound Filling Extraction on 3D Faces: Auto-Segmentation and Wound Face Regeneration Approach

    Authors: Duong Q. Nguyen, Thinh D. Le, Phuong D. Nguyen, Nga T. K. Le, H. Nguyen-Xuan

    Abstract: Facial wound segmentation plays a crucial role in preoperative planning and optimizing patient outcomes in various medical applications. In this paper, we propose an efficient approach for automating 3D facial wound segmentation using a two-stream graph convolutional network. Our method leverages the Cir3D-FaIR dataset and addresses the challenge of data imbalance through extensive experimentation… ▽ More

    Submitted 12 July, 2023; v1 submitted 4 July, 2023; originally announced July 2023.

  49. arXiv:2307.01558  [pdf, other

    cs.LG cs.AI

    Scalable variable selection for two-view learning tasks with projection operators

    Authors: Sandor Szedmak, Riikka Huusari, Tat Hong Duong Le, Juho Rousu

    Abstract: In this paper we propose a novel variable selection method for two-view settings, or for vector-valued supervised learning problems. Our framework is able to handle extremely large scale selection tasks, where number of data samples could be even millions. In a nutshell, our method performs variable selection by iteratively selecting variables that are highly correlated with the output variables,… ▽ More

    Submitted 4 July, 2023; originally announced July 2023.

    Comments: 17 pages, 15 PDF figures

  50. arXiv:2306.12453  [pdf, other

    cs.LG cs.AI stat.ME

    Learning Conditional Instrumental Variable Representation for Causal Effect Estimation

    Authors: Debo Cheng, Ziqi Xu, Jiuyong Li, Lin Liu, Thuc Duy Le, Jixue Liu

    Abstract: One of the fundamental challenges in causal inference is to estimate the causal effect of a treatment on its outcome of interest from observational data. However, causal effect estimation often suffers from the impacts of confounding bias caused by unmeasured confounders that affect both the treatment and the outcome. The instrumental variable (IV) approach is a powerful way to eliminate the confo… ▽ More

    Submitted 20 June, 2023; originally announced June 2023.

    Comments: Debo Cheng and Ziqi Xu contributed equally. 20 pages, 5 tables, and 3 figures. Accepted at ECML-PKDD2023