Skip to main content

Showing 1–50 of 418 results for author: Chang, K

  1. arXiv:2407.08473  [pdf, other

    cs.AR cs.AI

    Natural language is not enough: Benchmarking multi-modal generative AI for Verilog generation

    Authors: Kaiyan Chang, Zhirong Chen, Yunhao Zhou, Wenlong Zhu, kun wang, Haobo Xu, Cangyuan Li, Mengdi Wang, Shengwen Liang, Huawei Li, Yinhe Han, Ying Wang

    Abstract: Natural language interfaces have exhibited considerable potential in the automation of Verilog generation derived from high-level specifications through the utilization of large language models, garnering significant attention. Nevertheless, this paper elucidates that visual representations contribute essential contextual information critical to design intent for hardware architectures possessing… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Accepted by ICCAD 2024

  2. arXiv:2407.06549  [pdf, other

    cs.IR cs.AI cs.CL cs.LG

    AutoTask: Task Aware Multi-Faceted Single Model for Multi-Task Ads Relevance

    Authors: Shouchang Guo, Sonam Damani, Keng-hao Chang

    Abstract: Ads relevance models are crucial in determining the relevance between user search queries and ad offers, often framed as a classification problem. The complexity of modeling increases significantly with multiple ad types and varying scenarios that exhibit both similarities and differences. In this work, we introduce a novel multi-faceted attention model that performs task aware feature combination… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  3. arXiv:2407.02511  [pdf, other

    cs.RO cs.AI cs.CL

    LLM-A*: Large Language Model Enhanced Incremental Heuristic Search on Path Planning

    Authors: Silin Meng, Yiwei Wang, Cheng-Fu Yang, Nanyun Peng, Kai-Wei Chang

    Abstract: Path planning is a fundamental scientific problem in robotics and autonomous navigation, requiring the derivation of efficient routes from starting to destination points while avoiding obstacles. Traditional algorithms like A* and its variants are capable of ensuring path validity but suffer from significant computational and memory inefficiencies as the state space grows. Conversely, large langua… ▽ More

    Submitted 19 June, 2024; originally announced July 2024.

    Comments: Submitted to The 2024 Conference on Empirical Methods in Natural Language Processing

  4. arXiv:2407.02235  [pdf

    cs.CL

    Towards a Holistic Framework for Multimodal Large Language Models in Three-dimensional Brain CT Report Generation

    Authors: Cheng-Yi Li, Kao-Jung Chang, Cheng-Fu Yang, Hsin-Yu Wu, Wenting Chen, Hritik Bansal, Ling Chen, Yi-Ping Yang, Yu-Chun Chen, Shih-Pin Chen, Jiing-Feng Lirng, Kai-Wei Chang, Shih-Hwa Chiou

    Abstract: Multi-modal large language models (MLLMs) have been given free rein to explore exciting medical applications with a primary focus on radiology report generation. Nevertheless, the preliminary success in 2D radiology captioning is incompetent to reflect the real-world diagnostic challenge in the volumetric 3D anatomy. To mitigate three crucial limitation aspects in the existing literature, includin… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 6 figures, 5 supplementary figures, 8 supplementary tables

  5. arXiv:2407.00377  [pdf, other

    cs.CL cs.AI cs.CV cs.CY

    The Factuality Tax of Diversity-Intervened Text-to-Image Generation: Benchmark and Fact-Augmented Intervention

    Authors: Yixin Wan, Di Wu, Haoran Wang, Kai-Wei Chang

    Abstract: Prompt-based "diversity interventions" are commonly adopted to improve the diversity of Text-to-Image (T2I) models depicting individuals with various racial or gender traits. However, will this strategy result in nonfactual demographic distribution, especially when generating real historical figures? In this work, we propose DemOgraphic FActualIty Representation (DoFaiR), a benchmark to systematic… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  6. arXiv:2407.00191  [pdf, other

    cs.CL

    MetaKP: On-Demand Keyphrase Generation

    Authors: Di Wu, Xiaoxian Shen, Kai-Wei Chang

    Abstract: Traditional keyphrase prediction methods predict a single set of keyphrases per document, failing to cater to the diverse needs of users and downstream applications. To bridge the gap, we introduce on-demand keyphrase generation, a novel paradigm that requires keyphrases that conform to specific high-level goals or intents. For this task, we present MetaKP, a large-scale benchmark comprising four… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

  7. arXiv:2406.19486  [pdf, other

    cs.CL cs.AI cs.ET cs.LG eess.SP

    LoPT: Low-Rank Prompt Tuning for Parameter Efficient Language Models

    Authors: Shouchang Guo, Sonam Damani, Keng-hao Chang

    Abstract: In prompt tuning, a prefix or suffix text is added to the prompt, and the embeddings (soft prompts) or token indices (hard prompts) of the prefix/suffix are optimized to gain more control over language models for specific tasks. This approach eliminates the need for hand-crafted prompt engineering or explicit model fine-tuning. Prompt tuning is significantly more parameter-efficient than model fin… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  8. arXiv:2406.15178  [pdf, other

    cs.CL

    Hybrid Alignment Training for Large Language Models

    Authors: Chenglong Wang, Hang Zhou, Kaiyan Chang, Bei Li, Yongyu Mu, Tong Xiao, Tongran Liu, Jingbo Zhu

    Abstract: Alignment training is crucial for enabling large language models (LLMs) to cater to human intentions and preferences. It is typically performed based on two stages with different objectives: instruction-following alignment and human-preference alignment. However, aligning LLMs with these objectives in sequence suffers from an inherent problem: the objectives may conflict, and the LLMs cannot guara… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: accepted by ACL (Findings) 2024

  9. arXiv:2406.14137  [pdf, other

    cs.CL

    MACAROON: Training Vision-Language Models To Be Your Engaged Partners

    Authors: Shujin Wu, Yi R. Fung, Sha Li, Yixin Wan, Kai-Wei Chang, Heng Ji

    Abstract: Large vision-language models (LVLMs), while proficient in following instructions and responding to diverse questions, invariably generate detailed responses even when questions are ambiguous or unanswerable, leading to hallucinations and bias issues. Thus, it is essential for LVLMs to proactively engage with humans to ask for clarifications or additional information for better responses. In this s… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: The code will be made public at https://github.com/ShujinWu-0814/MACAROON

  10. arXiv:2406.13692  [pdf, other

    cs.CL

    Synchronous Faithfulness Monitoring for Trustworthy Retrieval-Augmented Generation

    Authors: Di Wu, Jia-Chen Gu, Fan Yin, Nanyun Peng, Kai-Wei Chang

    Abstract: Retrieval-augmented language models (RALMs) have shown strong performance and wide applicability in knowledge-intensive tasks. However, there are significant trustworthiness concerns as RALMs are prone to generating unfaithful outputs, including baseless information or contradictions with the retrieved context. This paper proposes SynCheck, a lightweight monitor that leverages fine-grained decodin… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  11. arXiv:2406.13444  [pdf, other

    cs.CL cs.CV

    VDebugger: Harnessing Execution Feedback for Debugging Visual Programs

    Authors: Xueqing Wu, Zongyu Lin, Songyan Zhao, Te-Lin Wu, Pan Lu, Nanyun Peng, Kai-Wei Chang

    Abstract: Visual programs are executable code generated by large language models to address visual reasoning problems. They decompose complex questions into multiple reasoning steps and invoke specialized models for each step to solve the problems. However, these programs are prone to logic errors, with our preliminary evaluation showing that 58% of the total errors are caused by program logic errors. Debug… ▽ More

    Submitted 27 June, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

    Comments: update reference

  12. arXiv:2406.12725  [pdf

    cs.CL cs.AI

    Can Large Language Models Code Like a Linguist?: A Case Study in Low Resource Sound Law Induction

    Authors: Atharva Naik, Kexun Zhang, Nathaniel Robinson, Aravind Mysore, Clayton Marr, Hong Sng Rebecca Byrnes, Anna Cai, Kalvin Chang, David Mortensen

    Abstract: Historical linguists have long written a kind of incompletely formalized ''program'' that converts reconstructed words in an ancestor language into words in one of its attested descendants that consist of a series of ordered string rewrite functions (called sound laws). They do this by observing pairs of words in the reconstructed language (protoforms) and the descendent language (reflexes) and co… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  13. arXiv:2406.10746  [pdf, other

    cs.CL cs.IR

    SparseCL: Sparse Contrastive Learning for Contradiction Retrieval

    Authors: Haike Xu, Zongyu Lin, Yizhou Sun, Kai-Wei Chang, Piotr Indyk

    Abstract: Contradiction retrieval refers to identifying and extracting documents that explicitly disagree with or refute the content of a query, which is important to many downstream applications like fact checking and data cleaning. To retrieve contradiction argument to the query from large document corpora, existing methods such as similarity search and crossencoder models exhibit significant limitations.… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  14. arXiv:2406.09411  [pdf, other

    cs.CV cs.AI cs.CL

    MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding

    Authors: Fei Wang, Xingyu Fu, James Y. Huang, Zekun Li, Qin Liu, Xiaogeng Liu, Mingyu Derek Ma, Nan Xu, Wenxuan Zhou, Kai Zhang, Tianyi Lorena Yan, Wenjie Jacky Mo, Hsiang-Hui Liu, Pan Lu, Chunyuan Li, Chaowei Xiao, Kai-Wei Chang, Dan Roth, Sheng Zhang, Hoifung Poon, Muhao Chen

    Abstract: We introduce MuirBench, a comprehensive benchmark that focuses on robust multi-image understanding capabilities of multimodal LLMs. MuirBench consists of 12 diverse multi-image tasks (e.g., scene understanding, ordering) that involve 10 categories of multi-image relations (e.g., multiview, temporal relations). Comprising 11,264 images and 2,600 multiple-choice questions, MuirBench is created in a… ▽ More

    Submitted 1 July, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: typos corrected, references added, Project Page: https://muirbench.github.io/

  15. arXiv:2406.05755  [pdf, other

    cs.CV

    A DeNoising FPN With Transformer R-CNN for Tiny Object Detection

    Authors: Hou-I Liu, Yu-Wen Tseng, Kai-Cheng Chang, Pin-Jyun Wang, Hong-Han Shuai, Wen-Huang Cheng

    Abstract: Despite notable advancements in the field of computer vision, the precise detection of tiny objects continues to pose a significant challenge, largely owing to the minuscule pixel representation allocated to these objects in imagery data. This challenge resonates profoundly in the domain of geoscience and remote sensing, where high-fidelity detection of tiny objects can facilitate a myriad of appl… ▽ More

    Submitted 15 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

    Comments: The article is accepted by IEEE Transactions on Geoscience and Remote Sensing. Our code will be available at https://github.com/hoiliu-0801/DNTR

  16. arXiv:2406.05003  [pdf, other

    cs.RO cs.HC

    Designs for Enabling Collaboration in Human-Machine Teaming via Interactive and Explainable Systems

    Authors: Rohan Paleja, Michael Munje, Kimberlee Chang, Reed Jensen, Matthew Gombolay

    Abstract: Collaborative robots and machine learning-based virtual agents are increasingly entering the human workspace with the aim of increasing productivity and enhancing safety. Despite this, we show in a ubiquitous experimental domain, Overcooked-AI, that state-of-the-art techniques for human-machine teaming (HMT), which rely on imitation or reinforcement learning, are brittle and result in a machine ag… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  17. arXiv:2406.03520  [pdf, other

    cs.CV cs.AI cs.LG

    VideoPhy: Evaluating Physical Commonsense for Video Generation

    Authors: Hritik Bansal, Zongyu Lin, Tianyi Xie, Zeshun Zong, Michal Yarom, Yonatan Bitton, Chenfanfu Jiang, Yizhou Sun, Kai-Wei Chang, Aditya Grover

    Abstract: Recent advances in internet-scale video data pretraining have led to the development of text-to-video generative models that can create high-quality videos across a broad range of visual concepts and styles. Due to their ability to synthesize realistic motions and render complex objects, these generative models have the potential to become general-purpose simulators of the physical world. However,… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 36 pages, 26 figures, 8 tables

  18. arXiv:2406.01495  [pdf, other

    cs.CL

    Re-ReST: Reflection-Reinforced Self-Training for Language Agents

    Authors: Zi-Yi Dou, Cheng-Fu Yang, Xueqing Wu, Kai-Wei Chang, Nanyun Peng

    Abstract: Finetuning language agents with reasoning-action trajectories is effective, but obtaining these trajectories from human annotations or stronger models is costly and sometimes impractical. In this paper, we investigate the use of self-training in language agents, which can generate supervision from the agent itself, offering a promising alternative without relying on human or stronger model demonst… ▽ More

    Submitted 7 July, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

  19. arXiv:2405.19716  [pdf, other

    cs.CV cs.CL

    Enhancing Large Vision Language Models with Self-Training on Image Comprehension

    Authors: Yihe Deng, Pan Lu, Fan Yin, Ziniu Hu, Sheng Shen, James Zou, Kai-Wei Chang, Wei Wang

    Abstract: Large vision language models (LVLMs) integrate large language models (LLMs) with pre-trained vision encoders, thereby activating the perception capability of the model to understand image inputs for different queries and conduct subsequent reasoning. Improving this capability requires high-quality vision-language data, which is costly and labor-intensive to acquire. Self-training approaches have b… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 19 pages, 14 figures, 6 tables

  20. arXiv:2405.19315  [pdf, other

    cs.CV cs.CL cs.LG

    Matryoshka Query Transformer for Large Vision-Language Models

    Authors: Wenbo Hu, Zi-Yi Dou, Liunian Harold Li, Amita Kamath, Nanyun Peng, Kai-Wei Chang

    Abstract: Large Vision-Language Models (LVLMs) typically encode an image into a fixed number of visual tokens (e.g., 576) and process these tokens with a language model. Despite their strong performance, LVLMs face challenges in adapting to varying computational constraints. This raises the question: can we achieve flexibility in the number of visual tokens to suit different tasks and computational resource… ▽ More

    Submitted 6 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

    Comments: Preprint. Our code and model are publicly available at https://github.com/gordonhu608/MQT-LLaVA

  21. arXiv:2405.18368  [pdf, other

    cs.CV

    The 2024 Brain Tumor Segmentation (BraTS) Challenge: Glioma Segmentation on Post-treatment MRI

    Authors: Maria Correia de Verdier, Rachit Saluja, Louis Gagnon, Dominic LaBella, Ujjwall Baid, Nourel Hoda Tahon, Martha Foltyn-Dumitru, Jikai Zhang, Maram Alafif, Saif Baig, Ken Chang, Gennaro D'Anna, Lisa Deptula, Diviya Gupta, Muhammad Ammar Haider, Ali Hussain, Michael Iv, Marinos Kontzialis, Paul Manning, Farzan Moodi, Teresa Nunes, Aaron Simon, Nico Sollmann, David Vu, Maruf Adewole , et al. (60 additional authors not shown)

    Abstract: Gliomas are the most common malignant primary brain tumors in adults and one of the deadliest types of cancer. There are many challenges in treatment and monitoring due to the genetic diversity and high intrinsic heterogeneity in appearance, shape, histology, and treatment response. Treatments include surgery, radiation, and systemic therapies, with magnetic resonance imaging (MRI) playing a key r… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 10 pages, 4 figures, 1 table

  22. Fair Evaluation of Federated Learning Algorithms for Automated Breast Density Classification: The Results of the 2022 ACR-NCI-NVIDIA Federated Learning Challenge

    Authors: Kendall Schmidt, Benjamin Bearce, Ken Chang, Laura Coombs, Keyvan Farahani, Marawan Elbatele, Kaouther Mouhebe, Robert Marti, Ruipeng Zhang, Yao Zhang, Yanfeng Wang, Yaojun Hu, Haochao Ying, Yuyang Xu, Conrad Testagrose, Mutlu Demirer, Vikash Gupta, Ünal Akünal, Markus Bujotzek, Klaus H. Maier-Hein, Yi Qin, Xiaomeng Li, Jayashree Kalpathy-Cramer, Holger R. Roth

    Abstract: The correct interpretation of breast density is important in the assessment of breast cancer risk. AI has been shown capable of accurately predicting breast density, however, due to the differences in imaging characteristics across mammography systems, models built using data from one system do not generalize well to other systems. Though federated learning (FL) has emerged as a way to improve the… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: 16 pages, 9 figures

    Journal ref: Medical Image Analysis Volume 95, July 2024, 103206

  23. arXiv:2405.11145  [pdf, other

    cs.CV cs.AI cs.MM

    Detecting Multimodal Situations with Insufficient Context and Abstaining from Baseless Predictions

    Authors: Junzhang Liu, Zhecan Wang, Hammad Ayyubi, Haoxuan You, Chris Thomas, Rui Sun, Shih-Fu Chang, Kai-Wei Chang

    Abstract: Despite the widespread adoption of Vision-Language Understanding (VLU) benchmarks such as VQA v2, OKVQA, A-OKVQA, GQA, VCR, SWAG, and VisualCOMET, our analysis reveals a pervasive issue affecting their integrity: these benchmarks contain samples where answers rely on assumptions unsupported by the provided context. Training models on such data foster biased learning and hallucinations as models te… ▽ More

    Submitted 25 May, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

  24. arXiv:2405.10452  [pdf, other

    cs.CL cs.LG

    Navigating Public Sentiment in the Circular Economy through Topic Modelling and Hyperparameter Optimisation

    Authors: Junhao Song, Yingfang Yuan, Kaiwen Chang, Bing Xu, Jin Xuan, Wei Pang

    Abstract: To advance the circular economy (CE), it is crucial to gain insights into the evolution of public sentiments, cognitive pathways of the masses concerning circular products and digital technology, and recognise the primary concerns. To achieve this, we collected data related to the CE from diverse platforms including Twitter, Reddit, and The Guardian. This comprehensive data collection spanned acro… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

  25. arXiv:2405.08991   

    cs.CV cs.RO

    Theoretical Analysis for Expectation-Maximization-Based Multi-Model 3D Registration

    Authors: David Jin, Harry Zhang, Kai Chang

    Abstract: We perform detailed theoretical analysis of an expectation-maximization-based algorithm recently proposed in for solving a variation of the 3D registration problem, named multi-model 3D registration. Despite having shown superior empirical results, did not theoretically justify the conditions under which the EM approach converges to the ground truth. In this project, we aim to close this gap by es… ▽ More

    Submitted 24 May, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

    Comments: Course project based on a previous submission. Very similar to the submitted conference version. see here: arXiv:2402.10865

  26. arXiv:2405.07387  [pdf, other

    cs.LG

    Semantic Loss Functions for Neuro-Symbolic Structured Prediction

    Authors: Kareem Ahmed, Stefano Teso, Paolo Morettin, Luca Di Liello, Pierfrancesco Ardino, Jacopo Gobbi, Yitao Liang, Eric Wang, Kai-Wei Chang, Andrea Passerini, Guy Van den Broeck

    Abstract: Structured output prediction problems are ubiquitous in machine learning. The prominent approach leverages neural networks as powerful feature extractors, otherwise assuming the independence of the outputs. These outputs, however, jointly encode an object, e.g. a path in a graph, and are therefore related through the structure underlying the output space. We discuss the semantic loss, which inject… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

    Comments: Preprint of Ch. 22 "Semantic Loss Functions for Neuro-Symbolic Structured Prediction" in "Compendium of Neurosymbolic Artificial Intelligence", https://ebooks.iospress.nl/ISBN/978-1-64368-406-2. arXiv admin note: substantial text overlap with arXiv:2201.11250, arXiv:2007.13197

  27. arXiv:2405.04682  [pdf, other

    cs.CV cs.AI cs.LG

    TALC: Time-Aligned Captions for Multi-Scene Text-to-Video Generation

    Authors: Hritik Bansal, Yonatan Bitton, Michal Yarom, Idan Szpektor, Aditya Grover, Kai-Wei Chang

    Abstract: Recent advances in diffusion-based generative modeling have led to the development of text-to-video (T2V) models that can generate high-quality videos conditioned on a text prompt. Most of these T2V models often produce single-scene video clips that depict an entity performing a particular action (e.g., 'a red panda climbing a tree'). However, it is pertinent to generate multi-scene videos since t… ▽ More

    Submitted 24 May, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

    Comments: 21 pages, 12 figures, 8 tables

  28. arXiv:2404.17779  [pdf, other

    cs.CL

    Medical Vision-Language Pre-Training for Brain Abnormalities

    Authors: Masoud Monajatipoor, Zi-Yi Dou, Aichi Chien, Nanyun Peng, Kai-Wei Chang

    Abstract: Vision-language models have become increasingly powerful for tasks that require an understanding of both visual and linguistic elements, bridging the gap between these modalities. In the context of multimodal clinical AI, there is a growing need for models that possess domain-specific knowledge, as existing models often lack the expertise required for medical applications. In this paper, we take b… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

  29. arXiv:2404.10508  [pdf, other

    cs.CL cs.AI cs.CY

    White Men Lead, Black Women Help? Benchmarking Language Agency Social Biases in LLMs

    Authors: Yixin Wan, Kai-Wei Chang

    Abstract: Language agency is an important aspect of evaluating social biases in texts. While several studies approached agency-related bias in human-written language, very limited research has investigated such biases in Large Language Model (LLM)-generated content. In addition, previous research often relies on string-matching techniques to identify agentic and communal words within texts, which fall short… ▽ More

    Submitted 20 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

  30. arXiv:2404.07376  [pdf, other

    cs.CL

    LLMs in Biomedicine: A study on clinical Named Entity Recognition

    Authors: Masoud Monajatipoor, Jiaxin Yang, Joel Stremmel, Melika Emami, Fazlolah Mohaghegh, Mozhdeh Rouhsedaghat, Kai-Wei Chang

    Abstract: Large Language Models (LLMs) demonstrate remarkable versatility in various NLP tasks but encounter distinct challenges in biomedical due to the complexities of language and data scarcity. This paper investigates LLMs application in the biomedical domain by exploring strategies to enhance their performance for the NER task. Our study reveals the importance of meticulously designed prompts in the bi… ▽ More

    Submitted 11 July, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

  31. arXiv:2404.04763  [pdf, other

    cs.CV cs.AI

    GenEARL: A Training-Free Generative Framework for Multimodal Event Argument Role Labeling

    Authors: Hritik Bansal, Po-Nien Kung, P. Jeffrey Brantingham, Kai-Wei Chang, Nanyun Peng

    Abstract: Multimodal event argument role labeling (EARL), a task that assigns a role for each event participant (object) in an image is a complex challenge. It requires reasoning over the entire image, the depicted event, and the interactions between various objects participating in the event. Existing models heavily rely on high-quality event-annotated training data to understand the event semantics and st… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

    Comments: 20 pages, 15 Figures, 13 figures

  32. arXiv:2404.03921  [pdf, other

    cs.CL

    Simple Techniques for Enhancing Sentence Embeddings in Generative Language Models

    Authors: Bowen Zhang, Kehua Chang, Chunping Li

    Abstract: Sentence Embedding stands as a fundamental task within the realm of Natural Language Processing, finding extensive application in search engines, expert systems, and question-and-answer platforms. With the continuous evolution of large language models such as LLaMA and Mistral, research on sentence embedding has recently achieved notable breakthroughs. However, these advancements mainly pertain to… ▽ More

    Submitted 15 May, 2024; v1 submitted 5 April, 2024; originally announced April 2024.

    Comments: Accepted by ICIC 2024 (Oral)

  33. arXiv:2404.03414  [pdf, other

    cs.CL cs.AI

    Can Small Language Models Help Large Language Models Reason Better?: LM-Guided Chain-of-Thought

    Authors: Jooyoung Lee, Fan Yang, Thanh Tran, Qian Hu, Emre Barut, Kai-Wei Chang, Chengwei Su

    Abstract: We introduce a novel framework, LM-Guided CoT, that leverages a lightweight (i.e., <1B) language model (LM) for guiding a black-box large (i.e., >10B) LM in reasoning tasks. Specifically, the lightweight LM first generates a rationale for each input instance. The Frozen large LM is then prompted to predict a task output based on the rationale generated by the lightweight LM. Our approach is resour… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: This paper is accepted to LREC-COLING 2024

  34. arXiv:2404.01679  [pdf, other

    cs.CL cs.SI physics.soc-ph

    Event Detection from Social Media for Epidemic Prediction

    Authors: Tanmay Parekh, Anh Mac, Jiarui Yu, Yuxuan Dong, Syed Shahriar, Bonnie Liu, Eric Yang, Kuan-Hao Huang, Wei Wang, Nanyun Peng, Kai-Wei Chang

    Abstract: Social media is an easy-to-access platform providing timely updates about societal trends and events. Discussions regarding epidemic-related events such as infections, symptoms, and social interactions can be crucial for informing policymaking during epidemic outbreaks. In our work, we pioneer exploiting Event Detection (ED) for better preparedness and early warnings of any upcoming epidemic by de… ▽ More

    Submitted 24 May, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: Accepted at NAACL 2024

  35. arXiv:2404.01077  [pdf, other

    cs.CL

    Efficient Prompting Methods for Large Language Models: A Survey

    Authors: Kaiyan Chang, Songcheng Xu, Chenglong Wang, Yingfeng Luo, Tong Xiao, Jingbo Zhu

    Abstract: Prompting has become a mainstream paradigm for adapting large language models (LLMs) to specific natural language processing tasks. While this approach opens the door to in-context learning of LLMs, it brings the additional computational burden of model inference and human effort of manual-designed prompts, particularly when using lengthy and complex prompts to guide and control the behavior of LL… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  36. arXiv:2404.01030  [pdf, ps, other

    cs.CV cs.AI cs.CY

    Survey of Bias In Text-to-Image Generation: Definition, Evaluation, and Mitigation

    Authors: Yixin Wan, Arjun Subramonian, Anaelia Ovalle, Zongyu Lin, Ashima Suvarna, Christina Chance, Hritik Bansal, Rebecca Pattichis, Kai-Wei Chang

    Abstract: The recent advancement of large and powerful models with Text-to-Image (T2I) generation abilities -- such as OpenAI's DALLE-3 and Google's Gemini -- enables users to generate high-quality images from textual prompts. However, it has become increasingly evident that even simple prompts could cause T2I models to exhibit conspicuous social bias in generated images. Such bias might lead to both alloca… ▽ More

    Submitted 1 May, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

  37. arXiv:2404.00530  [pdf, other

    cs.CL cs.AI cs.LG

    Comparing Bad Apples to Good Oranges: Aligning Large Language Models via Joint Preference Optimization

    Authors: Hritik Bansal, Ashima Suvarna, Gantavya Bhatt, Nanyun Peng, Kai-Wei Chang, Aditya Grover

    Abstract: A common technique for aligning large language models (LLMs) relies on acquiring human preferences by comparing multiple generations conditioned on a fixed context. This only leverages the pairwise comparisons when the generations are placed in an identical context. However, such conditional rankings often fail to capture the complex and multidimensional aspects of human preferences. In this work,… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: 25 pages, 14 figures, 5 tables

  38. arXiv:2403.14624  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?

    Authors: Renrui Zhang, Dongzhi Jiang, Yichi Zhang, Haokun Lin, Ziyu Guo, Pengshuo Qiu, Aojun Zhou, Pan Lu, Kai-Wei Chang, Peng Gao, Hongsheng Li

    Abstract: The remarkable progress of Multi-modal Large Language Models (MLLMs) has garnered unparalleled attention, due to their superior performance in visual contexts. However, their capabilities in visual math problem-solving remain insufficiently evaluated and understood. We investigate current benchmarks to incorporate excessive visual content within textual questions, which potentially assist MLLMs in… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: 46 Pages, Work in Progress, Benchmark Project Page: https://mathverse-cuhk.github.io

  39. arXiv:2403.11202  [pdf, other

    cs.AR cs.AI cs.PL

    Data is all you need: Finetuning LLMs for Chip Design via an Automated design-data augmentation framework

    Authors: Kaiyan Chang, Kun Wang, Nan Yang, Ying Wang, Dantong Jin, Wenlong Zhu, Zhirong Chen, Cangyuan Li, Hao Yan, Yunhao Zhou, Zhuoliang Zhao, Yuan Cheng, Yudong Pan, Yiqi Liu, Mengdi Wang, Shengwen Liang, Yinhe Han, Huawei Li, Xiaowei Li

    Abstract: Recent advances in large language models have demonstrated their potential for automated generation of hardware description language (HDL) code from high-level prompts. Researchers have utilized fine-tuning to enhance the ability of these large language models (LLMs) in the field of Chip Design. However, the lack of Verilog data hinders further improvement in the quality of Verilog generation by L… ▽ More

    Submitted 10 July, 2024; v1 submitted 17 March, 2024; originally announced March 2024.

    Comments: DAC 2024

  40. arXiv:2403.09296  [pdf, other

    cs.CV

    Select and Distill: Selective Dual-Teacher Knowledge Transfer for Continual Learning on Vision-Language Models

    Authors: Yu-Chu Yu, Chi-Pin Huang, Jr-Jen Chen, Kai-Po Chang, Yung-Hsuan Lai, Fu-En Yang, Yu-Chiang Frank Wang

    Abstract: Large-scale vision-language models (VLMs) have shown a strong zero-shot generalization capability on unseen-domain data. However, adapting pre-trained VLMs to a sequence of downstream tasks often leads to the forgetting of previously learned knowledge and a reduction in zero-shot classification performance. To tackle this problem, we propose a unique Selective Dual-Teacher Knowledge Transfer frame… ▽ More

    Submitted 17 July, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

    Comments: Accepted to ECCV 2024. Project page: https://chuyu.org/research/snd

  41. arXiv:2403.06355  [pdf, other

    cs.CL cs.CV

    Multi-modal Semantic Understanding with Contrastive Cross-modal Feature Alignment

    Authors: Ming Zhang, Ke Chang, Yunfang Wu

    Abstract: Multi-modal semantic understanding requires integrating information from different modalities to extract users' real intention behind words. Most previous work applies a dual-encoder structure to separately encode image and text, but fails to learn cross-modal feature alignment, making it hard to achieve cross-modal deep information interaction. This paper proposes a novel CLIP-guided contrastive-… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

    Comments: 10 pages, 4 figures, accepted by LREC-COLING 2024(main conference, long paper)

  42. arXiv:2403.02528  [pdf, other

    cs.CL cs.AI

    DACO: Towards Application-Driven and Comprehensive Data Analysis via Code Generation

    Authors: Xueqing Wu, Rui Zheng, Jingzhen Sha, Te-Lin Wu, Hanyu Zhou, Mohan Tang, Kai-Wei Chang, Nanyun Peng, Haoran Huang

    Abstract: Data analysis is a crucial analytical process to generate in-depth studies and conclusive insights to comprehensively answer a given user query for tabular data. In this work, we aim to propose new resources and benchmarks to inspire future research on this crucial yet challenging and under-explored task. However, collecting data analysis annotations curated by experts can be prohibitively expensi… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  43. arXiv:2402.18048  [pdf, other

    cs.CL

    Characterizing Truthfulness in Large Language Model Generations with Local Intrinsic Dimension

    Authors: Fan Yin, Jayanth Srinivasa, Kai-Wei Chang

    Abstract: We study how to characterize and predict the truthfulness of texts generated from large language models (LLMs), which serves as a crucial step in building trust between humans and LLMs. Although several approaches based on entropy or verbalized uncertainty have been proposed to calibrate model predictions, these methods are often intractable, sensitive to hyperparameters, and less reliable when ap… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: preprint, 9 pages, 5 figures

  44. arXiv:2402.17644  [pdf, other

    cs.CL cs.AI

    Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data

    Authors: Xiao Liu, Zirui Wu, Xueqing Wu, Pan Lu, Kai-Wei Chang, Yansong Feng

    Abstract: Quantitative reasoning is a critical skill to analyze data, yet the assessment of such ability remains limited. To address this gap, we introduce the Quantitative Reasoning with Data (QRData) benchmark, aiming to evaluate Large Language Models' capability in statistical and causal reasoning with real-world data. The benchmark comprises a carefully constructed dataset of 411 questions accompanied b… ▽ More

    Submitted 9 June, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

    Comments: Findings of ACL 2024. Project website: https://xxxiaol.github.io/QRData/

  45. arXiv:2402.14052  [pdf, other

    cs.CL

    On Leveraging Encoder-only Pre-trained Language Models for Effective Keyphrase Generation

    Authors: Di Wu, Wasi Uddin Ahmad, Kai-Wei Chang

    Abstract: This study addresses the application of encoder-only Pre-trained Language Models (PLMs) in keyphrase generation (KPG) amidst the broader availability of domain-tailored encoder-only models compared to encoder-decoder models. We investigate three core inquiries: (1) the efficacy of encoder-only PLMs in KPG, (2) optimal architectural decisions for employing encoder-only PLMs in KPG, and (3) a perfor… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

    Comments: LREC-COLING 2024 camera ready. arXiv admin note: text overlap with arXiv:2212.10233

  46. arXiv:2402.13236  [pdf, other

    eess.AS cs.SD

    Towards audio language modeling -- an overview

    Authors: Haibin Wu, Xuanjun Chen, Yi-Cheng Lin, Kai-wei Chang, Ho-Lam Chung, Alexander H. Liu, Hung-yi Lee

    Abstract: Neural audio codecs are initially introduced to compress audio data into compact codes to reduce transmission latency. Researchers recently discovered the potential of codecs as suitable tokenizers for converting continuous audio into discrete codes, which can be employed to develop audio language models (LMs). Numerous high-performance neural audio codecs and codec-based LMs have been developed.… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  47. arXiv:2402.13071  [pdf, other

    eess.AS cs.SD

    Codec-SUPERB: An In-Depth Analysis of Sound Codec Models

    Authors: Haibin Wu, Ho-Lam Chung, Yi-Cheng Lin, Yuan-Kuei Wu, Xuanjun Chen, Yu-Chi Pai, Hsiu-Hsuan Wang, Kai-Wei Chang, Alexander H. Liu, Hung-yi Lee

    Abstract: The sound codec's dual roles in minimizing data transmission latency and serving as tokenizers underscore its critical importance. Recent years have witnessed significant developments in codec models. The ideal sound codec should preserve content, paralinguistics, speakers, and audio information. However, the question of which codec achieves optimal sound information preservation remains unanswere… ▽ More

    Submitted 7 June, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: Github: https://github.com/voidful/Codec-SUPERB

  48. arXiv:2402.13018  [pdf, other

    eess.AS cs.SD

    EMO-SUPERB: An In-depth Look at Speech Emotion Recognition

    Authors: Haibin Wu, Huang-Cheng Chou, Kai-Wei Chang, Lucas Goncalves, Jiawei Du, Jyh-Shing Roger Jang, Chi-Chun Lee, Hung-Yi Lee

    Abstract: Speech emotion recognition (SER) is a pivotal technology for human-computer interaction systems. However, 80.77% of SER papers yield results that cannot be reproduced. We develop EMO-SUPERB, short for EMOtion Speech Universal PERformance Benchmark, which aims to enhance open-source initiatives for SER. EMO-SUPERB includes a user-friendly codebase to leverage 15 state-of-the-art speech self-supervi… ▽ More

    Submitted 12 March, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: webpage: https://emosuperb.github.io/

  49. arXiv:2402.12998  [pdf

    cs.CL

    Phonotactic Complexity across Dialects

    Authors: Ryan Soh-Eun Shim, Kalvin Chang, David R. Mortensen

    Abstract: Received wisdom in linguistic typology holds that if the structure of a language becomes more complex in one dimension, it will simplify in another, building on the assumption that all languages are equally complex (Joseph and Newmeyer, 2012). We study this claim on a micro-level, using a tightly-controlled sample of Dutch dialects (across 366 collection sites) and Min dialects (across 60 sites),… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

    Comments: Accepted to COLING-LREC 2024

  50. arXiv:2402.11217  [pdf, other

    cs.CL cs.CV

    Asclepius: A Spectrum Evaluation Benchmark for Medical Multi-Modal Large Language Models

    Authors: Wenxuan Wang, Yihang Su, Jingyuan Huan, Jie Liu, Wenting Chen, Yudi Zhang, Cheng-Yi Li, Kao-Jung Chang, Xiaohan Xin, Linlin Shen, Michael R. Lyu

    Abstract: The significant breakthroughs of Medical Multi-Modal Large Language Models (Med-MLLMs) renovate modern healthcare with robust information synthesis and medical decision support. However, these models are often evaluated on benchmarks that are unsuitable for the Med-MLLMs due to the intricate nature of the real-world diagnostic frameworks, which encompass diverse medical specialties and involve com… ▽ More

    Submitted 17 February, 2024; originally announced February 2024.

    Comments: 20 pages, 15 figures