Skip to main content

Showing 1–11 of 11 results for author: Leroy, G

  1. arXiv:2407.07997  [pdf

    cs.LG

    ICD Codes are Insufficient to Create Datasets for Machine Learning: An Evaluation Using All of Us Data for Coccidioidomycosis and Myocardial Infarction

    Authors: Abigail E. Whitlock, Gondy Leroy, Fariba M. Donovan, John N. Galgiani

    Abstract: In medicine, machine learning (ML) datasets are often built using the International Classification of Diseases (ICD) codes. As new models are being developed, there is a need for larger datasets. However, ICD codes are intended for billing. We aim to determine how suitable ICD codes are for creating datasets to train ML models. We focused on a rare and common disease using the All of Us database.… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: Accepted at IEEE ICHI 2024 conference. Will be published in IEEE Xplore

  2. arXiv:2406.17787  [pdf

    cs.CL

    Role of Dependency Distance in Text Simplification: A Human vs ChatGPT Simplification Comparison

    Authors: Sumi Lee, Gondy Leroy, David Kauchak, Melissa Just

    Abstract: This study investigates human and ChatGPT text simplification and its relationship to dependency distance. A set of 220 sentences, with increasing grammatical difficulty as measured in a prior user study, were simplified by a human expert and using ChatGPT. We found that the three sentence sets all differed in mean dependency distances: the highest in the original sentence set, followed by ChatGPT… ▽ More

    Submitted 20 May, 2024; originally announced June 2024.

  3. arXiv:2405.13030  [pdf, ps, other

    cs.CL cs.AI

    Crowdsourcing with Enhanced Data Quality Assurance: An Efficient Approach to Mitigate Resource Scarcity Challenges in Training Large Language Models for Healthcare

    Authors: P. Barai, G. Leroy, P. Bisht, J. M. Rothman, S. Lee, J. Andrews, S. A. Rice, A. Ahmed

    Abstract: Large Language Models (LLMs) have demonstrated immense potential in artificial intelligence across various domains, including healthcare. However, their efficacy is hindered by the need for high-quality labeled data, which is often expensive and time-consuming to create, particularly in low-resource domains like healthcare. To address these challenges, we propose a crowdsourcing (CS) framework enr… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: Published in AMIA Summit, Boston, 2024. https://knowledge.amia.org/Info2024/pdf/Info2024a022/Info2024fl021

  4. arXiv:2405.06695  [pdf

    cs.CL cs.AI

    Utilizing Large Language Models to Generate Synthetic Data to Increase the Performance of BERT-Based Neural Networks

    Authors: Chancellor R. Woolsey, Prakash Bisht, Joshua Rothman, Gondy Leroy

    Abstract: An important issue impacting healthcare is a lack of available experts. Machine learning (ML) models could resolve this by aiding in diagnosing patients. However, creating datasets large enough to train these models is expensive. We evaluated large language models (LLMs) for data creation. Using Autism Spectrum Disorders (ASD), we prompted ChatGPT and GPT-Premium to generate 4,200 synthetic observ… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Published in 2024 American Medical Informatics Association (AMIA) Summit March 18-21

  5. arXiv:2405.01592  [pdf

    cs.CL cs.AI

    Text and Audio Simplification: Human vs. ChatGPT

    Authors: Gondy Leroy, David Kauchak, Philip Harber, Ankit Pal, Akash Shukla

    Abstract: Text and audio simplification to increase information comprehension are important in healthcare. With the introduction of ChatGPT, an evaluation of its simplification performance is needed. We provide a systematic comparison of human and ChatGPT simplified texts using fourteen metrics indicative of text difficulty. We briefly introduce our online editor where these simplification tools, including… ▽ More

    Submitted 29 April, 2024; originally announced May 2024.

    Comments: AMIA Summit, Boston, 2024

    ACM Class: H.4

  6. arXiv:2404.19119  [pdf

    cs.CL

    Effects of Added Emphasis and Pause in Audio Delivery of Health Information

    Authors: Arif Ahmed, Gondy Leroy, Stephen A. Rains, Philip Harber, David Kauchak, Prosanta Barai

    Abstract: Health literacy is crucial to supporting good health and is a major national goal. Audio delivery of information is becoming more popular for informing oneself. In this study, we evaluate the effect of audio enhancements in the form of information emphasis and pauses with health texts of varying difficulty and we measure health information comprehension and retention. We produced audio snippets fr… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: This manuscript is accepted to American Medical Informatics Association summit, 2024

  7. arXiv:2305.14341  [pdf, other

    cs.CL

    APPLS: Evaluating Evaluation Metrics for Plain Language Summarization

    Authors: Yue Guo, Tal August, Gondy Leroy, Trevor Cohen, Lucy Lu Wang

    Abstract: While there has been significant development of models for Plain Language Summarization (PLS), evaluation remains a challenge. PLS lacks a dedicated assessment metric, and the suitability of text generation evaluation metrics is unclear due to the unique transformations involved (e.g., adding background explanations, removing jargon). To address these questions, our study introduces a granular met… ▽ More

    Submitted 23 July, 2024; v1 submitted 23 May, 2023; originally announced May 2023.

  8. arXiv:2211.03818  [pdf, other

    cs.CL

    Retrieval augmentation of large language models for lay language generation

    Authors: Yue Guo, Wei Qiu, Gondy Leroy, Sheng Wang, Trevor Cohen

    Abstract: Recent lay language generation systems have used Transformer models trained on a parallel corpus to increase health information accessibility. However, the applicability of these models is constrained by the limited size and topical breadth of available corpora. We introduce CELLS, the largest (63k pairs) and broadest-ranging (12 journals) parallel corpus for lay language generation. The abstract… ▽ More

    Submitted 25 January, 2024; v1 submitted 7 November, 2022; originally announced November 2022.

  9. arXiv:2105.09637  [pdf, other

    cs.AI cs.LG

    Navigation Turing Test (NTT): Learning to Evaluate Human-Like Navigation

    Authors: Sam Devlin, Raluca Georgescu, Ida Momennejad, Jaroslaw Rzepecki, Evelyn Zuniga, Gavin Costello, Guy Leroy, Ali Shaw, Katja Hofmann

    Abstract: A key challenge on the path to developing agents that learn complex human-like behavior is the need to quickly and accurately quantify human-likeness. While human assessments of such behavior can be highly accurate, speed and scalability are limited. We address these limitations through a novel automated Navigation Turing Test (ANTT) that learns to predict human judgments of human-likeness. We dem… ▽ More

    Submitted 28 July, 2021; v1 submitted 20 May, 2021; originally announced May 2021.

    Comments: All data collected throughout this study, plus the code to reproduce our analysis and ANTT are available at https://github.com/microsoft/NTT

    Journal ref: Proceedings of the 38th International Conference on Machine Learning (ICML), 139:2644-2653, 2021

  10. arXiv:2010.10573  [pdf, other

    cs.CL

    AutoMeTS: The Autocomplete for Medical Text Simplification

    Authors: Hoang Van, David Kauchak, Gondy Leroy

    Abstract: The goal of text simplification (TS) is to transform difficult text into a version that is easier to understand and more broadly accessible to a wide variety of readers. In some domains, such as healthcare, fully automated approaches cannot be used since information must be accurately preserved. Instead, semi-automated approaches can be used that assist a human writer in simplifying text faster an… ▽ More

    Submitted 20 October, 2020; originally announced October 2020.

    Comments: 9 pages, 3 figures, and 8 tables, Accpeted to COLING 2020

  11. arXiv:2008.08055  [pdf, other

    cs.CV

    Communicative Reinforcement Learning Agents for Landmark Detection in Brain Images

    Authors: Guy Leroy, Daniel Rueckert, Amir Alansary

    Abstract: Accurate detection of anatomical landmarks is an essential step in several medical imaging tasks. We propose a novel communicative multi-agent reinforcement learning (C-MARL) system to automatically detect landmarks in 3D brain images. C-MARL enables the agents to learn explicit communication channels, as well as implicit communication signals by sharing certain weights of the architecture among a… ▽ More

    Submitted 27 September, 2020; v1 submitted 18 August, 2020; originally announced August 2020.

    Comments: Accepted for the MLCN workshop, MICCAI 2020