Skip to main content

Showing 1–30 of 30 results for author: Hu, T

  1. arXiv:2404.16544  [pdf, other

    eess.IV

    Image registration based automated lesion correspondence pipeline for longitudinal CT data

    Authors: Subrata Mukherjee, Thibaud Coroller, Craig Wang, Ravi K. Samala, Tingting Hu, Didem Gokcay, Nicholas Petrick, Berkman Sahiner, Qian Cao

    Abstract: Patients diagnosed with metastatic breast cancer (mBC) typically undergo several radiographic assessments during their treatment. mBC often involves multiple metastatic lesions in different organs, it is imperative to accurately track and assess these lesions to gain a comprehensive understanding of the disease's response to treatment. Computerized analysis methods that rely on lesion-level tracki… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  2. arXiv:2404.14132  [pdf, other

    cs.CV eess.IV

    CRNet: A Detail-Preserving Network for Unified Image Restoration and Enhancement Task

    Authors: Kangzhen Yang, Tao Hu, Kexin Dai, Genggeng Chen, Yu Cao, Wei Dong, Peng Wu, Yanning Zhang, Qingsen Yan

    Abstract: In real-world scenarios, images captured often suffer from blurring, noise, and other forms of image degradation, and due to sensor limitations, people usually can only obtain low dynamic range images. To achieve high-quality images, researchers have attempted various image restoration and enhancement operations on photographs, including denoising, deblurring, and high dynamic range imaging. Howev… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: This paper is accepted by CVPR2024 Workshop, Code: https://github.com/CalvinYang0/CRNet

  3. arXiv:2404.13537  [pdf, other

    eess.IV cs.CV

    Bracketing Image Restoration and Enhancement with High-Low Frequency Decomposition

    Authors: Genggeng Chen, Kexin Dai, Kangzhen Yang, Tao Hu, Xiangyu Chen, Yongqing Yang, Wei Dong, Peng Wu, Yanning Zhang, Qingsen Yan

    Abstract: In real-world scenarios, due to a series of image degradations, obtaining high-quality, clear content photos is challenging. While significant progress has been made in synthesizing high-quality images, previous methods for image restoration and enhancement often overlooked the characteristics of different degradations. They applied the same structure to address various types of degradation, resul… ▽ More

    Submitted 24 April, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

    Comments: This paper is accepted by CVPR 2024 Workshop, code: https://github.com/chengeng0613/HLNet

  4. arXiv:2403.19425  [pdf, ps, other

    eess.IV cs.CV

    A Robust Ensemble Algorithm for Ischemic Stroke Lesion Segmentation: Generalizability and Clinical Utility Beyond the ISLES Challenge

    Authors: Ezequiel de la Rosa, Mauricio Reyes, Sook-Lei Liew, Alexandre Hutton, Roland Wiest, Johannes Kaesmacher, Uta Hanning, Arsany Hakim, Richard Zubal, Waldo Valenzuela, David Robben, Diana M. Sima, Vincenzo Anania, Arne Brys, James A. Meakin, Anne Mickan, Gabriel Broocks, Christian Heitkamp, Shengbo Gao, Kongming Liang, Ziji Zhang, Md Mahfuzur Rahman Siddiquee, Andriy Myronenko, Pooya Ashtari, Sabine Van Huffel , et al. (33 additional authors not shown)

    Abstract: Diffusion-weighted MRI (DWI) is essential for stroke diagnosis, treatment decisions, and prognosis. However, image and disease variability hinder the development of generalizable AI algorithms with clinical value. We address this gap by presenting a novel ensemble algorithm derived from the 2022 Ischemic Stroke Lesion Segmentation (ISLES) challenge. ISLES'22 provided 400 patient scans with ischemi… ▽ More

    Submitted 3 April, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

  5. arXiv:2311.00932  [pdf, other

    cs.CV eess.IV

    Towards High-quality HDR Deghosting with Conditional Diffusion Models

    Authors: Qingsen Yan, Tao Hu, Yuan Sun, Hao Tang, Yu Zhu, Wei Dong, Luc Van Gool, Yanning Zhang

    Abstract: High Dynamic Range (HDR) images can be recovered from several Low Dynamic Range (LDR) images by existing Deep Neural Networks (DNNs) techniques. Despite the remarkable progress, DNN-based methods still generate ghosting artifacts when LDR images have saturation and large motion, which hinders potential applications in real-world scenarios. To address this challenge, we formulate the HDR deghosting… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

    Comments: accepted by IEEE TCSVT

  6. arXiv:2309.10707  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Corpus Synthesis for Zero-shot ASR domain Adaptation using Large Language Models

    Authors: Hsuan Su, Ting-Yao Hu, Hema Swetha Koppula, Raviteja Vemulapalli, Jen-Hao Rick Chang, Karren Yang, Gautam Varma Mantena, Oncel Tuzel

    Abstract: While Automatic Speech Recognition (ASR) systems are widely used in many real-world applications, they often do not generalize well to new domains and need to be finetuned on data from these domains. However, target-domain data usually are not readily available in many scenarios. In this paper, we propose a new strategy for adapting ASR models to new target domains without any text or speech from… ▽ More

    Submitted 18 September, 2023; originally announced September 2023.

  7. arXiv:2307.15510  [pdf, other

    eess.SY cs.MA cs.RO

    Formation Control for Moving Target Enclosing via Relative Localization

    Authors: Xueming Liu, Kunda Liu, Tianjiang Hu, Qingrui Zhang

    Abstract: In this paper, we investigate the problem of controlling multiple unmanned aerial vehicles (UAVs) to enclose a moving target in a distributed fashion based on a relative distance and self-displacement measurements. A relative localization technique is developed based on the recursive least square estimation (RLSE) technique with a forgetting factor to estimates both the ``UAV-UAV'' and ``UAV-targe… ▽ More

    Submitted 28 July, 2023; originally announced July 2023.

    Comments: 8 Pages, accepted by IEEE CDC 2023

  8. arXiv:2307.01981  [pdf, other

    eess.IV cs.CV cs.LG

    A ChatGPT Aided Explainable Framework for Zero-Shot Medical Image Diagnosis

    Authors: Jiaxiang Liu, Tianxiang Hu, Yan Zhang, Xiaotang Gai, Yang Feng, Zuozhu Liu

    Abstract: Zero-shot medical image classification is a critical process in real-world scenarios where we have limited access to all possible diseases or large-scale annotated data. It involves computing similarity scores between a query medical image and possible disease categories to determine the diagnostic result. Recent advances in pretrained vision-language models (VLMs) such as CLIP have shown great pe… ▽ More

    Submitted 4 July, 2023; originally announced July 2023.

    Comments: Workshop on Interpretable ML in Healthcare at International Conference on Machine Learning (ICML) 2023

  9. arXiv:2307.01979  [pdf, other

    eess.IV cs.CV

    ToothSegNet: Image Degradation meets Tooth Segmentation in CBCT Images

    Authors: Jiaxiang Liu, Tianxiang Hu, Yang Feng, Wanghui Ding, Zuozhu Liu

    Abstract: In computer-assisted orthodontics, three-dimensional tooth models are required for many medical treatments. Tooth segmentation from cone-beam computed tomography (CBCT) images is a crucial step in constructing the models. However, CBCT image quality problems such as metal artifacts and blurring caused by shooting equipment and patients' dental conditions make the segmentation difficult. In this pa… ▽ More

    Submitted 4 July, 2023; originally announced July 2023.

    Comments: IEEE ISBI 2023

  10. arXiv:2303.14885  [pdf, other

    eess.AS cs.LG cs.SD

    Text is All You Need: Personalizing ASR Models using Controllable Speech Synthesis

    Authors: Karren Yang, Ting-Yao Hu, Jen-Hao Rick Chang, Hema Swetha Koppula, Oncel Tuzel

    Abstract: Adapting generic speech recognition models to specific individuals is a challenging problem due to the scarcity of personalized data. Recent works have proposed boosting the amount of training data using personalized text-to-speech synthesis. Here, we ask two fundamental questions about this strategy: when is synthetic data effective for personalization, and why is it effective in those cases? To… ▽ More

    Submitted 26 March, 2023; originally announced March 2023.

    Comments: ICASSP 2023

  11. arXiv:2212.03540  [pdf, other

    eess.SY cs.RO

    EASpace: Enhanced Action Space for Policy Transfer

    Authors: Zheng Zhang, Qingrui Zhang, Bo Zhu, Xiaohan Wang, Tianjiang Hu

    Abstract: Formulating expert policies as macro actions promises to alleviate the long-horizon issue via structured exploration and efficient credit assignment. However, traditional option-based multi-policy transfer methods suffer from inefficient exploration of macro action's length and insufficient exploitation of useful long-duration macro actions. In this paper, a novel algorithm named EASpace (Enhanced… ▽ More

    Submitted 24 July, 2023; v1 submitted 7 December, 2022; originally announced December 2022.

    Comments: 15 Pages

  12. arXiv:2211.10262  [pdf

    eess.SP

    Adaptive De-noising of Photoacoustic Signal and Image based on Modified Kalman Filter

    Authors: Tianqu Hu, Zihao Huang, Peng Ge, Feng Gao, Fei Gao

    Abstract: As a burgeoning medical imaging method based on hybrid fusion of light and ultrasound, photoacoustic imaging (PAI) has demonstrated high potential in various biomedical applications recently, especially in revealing the functional and molecular information to improve diagnostic accuracy. However, stemming from weak amplitude and unavoidable random noise, caused by limited laser power and severe at… ▽ More

    Submitted 18 November, 2022; originally announced November 2022.

  13. arXiv:2210.13567  [pdf, ps, other

    cs.CV cs.LG cs.SD eess.AS

    I see what you hear: a vision-inspired method to localize words

    Authors: Mohammad Samragh, Arnav Kundu, Ting-Yao Hu, Minsik Cho, Aman Chadha, Ashish Shrivastava, Oncel Tuzel, Devang Naik

    Abstract: This paper explores the possibility of using visual object detection techniques for word localization in speech data. Object detection has been thoroughly studied in the contemporary literature for visual data. Noting that an audio can be interpreted as a 1-dimensional image, object localization techniques can be fundamentally useful for word localization. Building upon this idea, we propose a lig… ▽ More

    Submitted 24 October, 2022; originally announced October 2022.

  14. arXiv:2203.04700  [pdf, other

    cs.RO cs.AI cs.MA eess.SY

    Multi-robot Cooperative Pursuit via Potential Field-Enhanced Reinforcement Learning

    Authors: Zheng Zhang, Xiaohan Wang, Qingrui Zhang, Tianjiang Hu

    Abstract: It is of great challenge, though promising, to coordinate collective robots for hunting an evader in a decentralized manner purely in light of local observations. In this paper, this challenge is addressed by a novel hybrid cooperative pursuit algorithm that combines reinforcement learning with the artificial potential field method. In the proposed algorithm, decentralized deep reinforcement learn… ▽ More

    Submitted 9 March, 2022; originally announced March 2022.

    Comments: This paper has been accepted by ICRA 2022

  15. arXiv:2202.11889  [pdf, other

    eess.IV cs.CV

    A spectral-spatial fusion anomaly detection method for hyperspectral imagery

    Authors: Zengfu Hou, Siyuan Cheng, Ting Hu

    Abstract: In hyperspectral, high-quality spectral signals convey subtle spectral differences to distinguish similar materials, thereby providing unique advantage for anomaly detection. Hence fine spectra of anomalous pixels can be effectively screened out from heterogeneous background pixels. Since the same materials have similar characteristics in spatial and spectral dimension, detection performance can b… ▽ More

    Submitted 23 February, 2022; originally announced February 2022.

  16. arXiv:2201.12346  [pdf, other

    cs.CV cs.AI cs.LG eess.IV

    DiriNet: A network to estimate the spatial and spectral degradation functions

    Authors: Ting Hu

    Abstract: The spatial and spectral degradation functions are critical to hyper- and multi-spectral image fusion. However, few work has been payed on the estimation of the degradation functions. To learn the spatial response function and the point spread function from the image pairs to be fused, we propose a Dirichlet network, where both functions are properly constrained. Specifically, the spatial response… ▽ More

    Submitted 27 January, 2022; originally announced January 2022.

  17. arXiv:2112.12386  [pdf, other

    eess.IV cs.CV

    KFWC: A Knowledge-Driven Deep Learning Model for Fine-grained Classification of Wet-AMD

    Authors: Haihong E, Jiawen He, Tianyi Hu, Lifei Wang, Lifei Yuan, Ruru Zhang, Meina Song

    Abstract: Automated diagnosis using deep neural networks can help ophthalmologists detect the blinding eye disease wet Age-related Macular Degeneration (AMD). Wet-AMD has two similar subtypes, Neovascular AMD and Polypoidal Choroidal Vessels (PCV). However, due to the difficulty in data collection and the similarity between images, most studies have only achieved the coarse-grained classification of wet-AMD… ▽ More

    Submitted 23 December, 2021; originally announced December 2021.

  18. A Latent Encoder Coupled Generative Adversarial Network (LE-GAN) for Efficient Hyperspectral Image Super-resolution

    Authors: Yue Shi, Liangxiu Han, Lianghao Han, Sheng Chang, Tongle Hu, Darren Dancey

    Abstract: Realistic hyperspectral image (HSI) super-resolution (SR) techniques aim to generate a high-resolution (HR) HSI with higher spectral and spatial fidelity from its low-resolution (LR) counterpart. The generative adversarial network (GAN) has proven to be an effective deep learning framework for image super-resolution. However, the optimisation process of existing GAN-based models frequently suffers… ▽ More

    Submitted 16 November, 2021; originally announced November 2021.

    Comments: 18 pages, 10 figures

  19. arXiv:2110.11479  [pdf, other

    eess.AS cs.LG cs.SD

    Synt++: Utilizing Imperfect Synthetic Data to Improve Speech Recognition

    Authors: Ting-Yao Hu, Mohammadreza Armandpour, Ashish Shrivastava, Jen-Hao Rick Chang, Hema Koppula, Oncel Tuzel

    Abstract: With recent advances in speech synthesis, synthetic data is becoming a viable alternative to real data for training speech recognition models. However, machine learning with synthetic data is not trivial due to the gap between the synthetic and the real data distributions. Synthetic datasets may contain artifacts that do not exist in real data such as structured noise, content errors, or unrealist… ▽ More

    Submitted 21 October, 2021; originally announced October 2021.

  20. arXiv:2106.13358  [pdf, other

    cs.RO cs.LG cs.MA eess.SP eess.SY

    Scalable Perception-Action-Communication Loops with Convolutional and Graph Neural Networks

    Authors: Ting-Kuei Hu, Fernando Gama, Tianlong Chen, Wenqing Zheng, Zhangyang Wang, Alejandro Ribeiro, Brian M. Sadler

    Abstract: In this paper, we present a perception-action-communication loop design using Vision-based Graph Aggregation and Inference (VGAI). This multi-agent decentralized learning-to-control framework maps raw visual observations to agent actions, aided by local communication among neighboring agents. Our framework is implemented by a cascade of a convolutional and a graph neural network (CNN / GNN), addre… ▽ More

    Submitted 5 November, 2021; v1 submitted 24 June, 2021; originally announced June 2021.

  21. arXiv:2105.05385  [pdf

    cs.SD cs.IR eess.AS stat.AP

    A Statistical Model for Melody Reduction

    Authors: Tianxue Hu, Claire Arthur

    Abstract: A commonly-cited reason for the poor performance of automatic chord estimation (ACE) systems within music information retrieval (MIR) is that non-chord tones (i.e., notes outside the supporting harmony) contribute to error during the labeling process. Despite the prevalence of machine learning approaches in MIR, there are cases where alternative approaches provide a simpler alternative while allow… ▽ More

    Submitted 11 May, 2021; originally announced May 2021.

    Comments: 5 pages, 1 figure. Proceeding and presentation available at Future Directions of Music Cognition but the conference has not yet officially published until summer 2021. http://org.osu.edu/mascats/march-6-talks/

  22. arXiv:2009.09915  [pdf, other

    eess.SY cs.MA

    Collaborative Target Tracking in Elliptic Coordinates: a Binocular Coordination Approach

    Authors: Yuan Chang, Zhiyong Sun, Han Zhou, Xiangke Wang, Lincheng Shen, Tianjiang Hu

    Abstract: This paper concentrates on the collaborative target tracking control of a pair of tracking vehicles with formation constraints. The proposed controller requires only distance measurements between tracking vehicles and the target. Its novelty lies in two aspects: 1) the elliptic coordinates are used to represent an arbitrary tracking formation without singularity, which can be deduced from inter-ag… ▽ More

    Submitted 21 September, 2020; originally announced September 2020.

    Comments: 6 pages, 5 figures

  23. arXiv:2008.05826  [pdf, other

    cs.CV cs.LG eess.IV

    Localizing the Common Action Among a Few Videos

    Authors: Pengwan Yang, Vincent Tao Hu, Pascal Mettes, Cees G. M. Snoek

    Abstract: This paper strives to localize the temporal extent of an action in a long untrimmed video. Where existing work leverages many examples with their start, their ending, and/or the class of the action during training time, we propose few-shot common action localization. The start and end of an action in a long untrimmed video is determined based on just a hand-full of trimmed video examples containin… ▽ More

    Submitted 25 August, 2020; v1 submitted 13 August, 2020; originally announced August 2020.

    Comments: ECCV 2020

  24. arXiv:2003.06227  [pdf, other

    eess.AS cs.CV cs.IT cs.LG cs.SD

    Unsupervised Style and Content Separation by Minimizing Mutual Information for Speech Synthesis

    Authors: Ting-Yao Hu, Ashish Shrivastava, Oncel Tuzel, Chandra Dhir

    Abstract: We present a method to generate speech from input text and a style vector that is extracted from a reference speech signal in an unsupervised manner, i.e., no style annotation, such as speaker information, is required. Existing unsupervised methods, during training, generate speech by computing style from the corresponding ground truth sample and use a decoder to combine the style vector with the… ▽ More

    Submitted 9 March, 2020; originally announced March 2020.

    Comments: Accepted at ICASSP 2020 (for presentation in a lecture session)

  25. Flyback-Based Multiple Output dc-dc Converter with Independent Voltage Regulation

    Authors: M. Tahan, D. Bamgboje, T. Hu

    Abstract: This paper proposes a new single input multiple output power supply by integrating a flyback converter and several buck converters. The flyback converter works as the main regulator, and the buck converters provide series voltage compensation with the aim of tight regulation. A time multiplexing switching scheme is proposed to deliver multiple output voltage levels via a two winding transformer an… ▽ More

    Submitted 9 February, 2020; originally announced February 2020.

    Comments: 8 Pages, 10 figures

  26. arXiv:2002.02308  [pdf, other

    eess.SY cs.CV

    VGAI: End-to-End Learning of Vision-Based Decentralized Controllers for Robot Swarms

    Authors: Ting-Kuei Hu, Fernando Gama, Tianlong Chen, Zhangyang Wang, Alejandro Ribeiro, Brian M. Sadler

    Abstract: Decentralized coordination of a robot swarm requires addressing the tension between local perceptions and actions, and the accomplishment of a global objective. In this work, we propose to learn decentralized controllers based on solely raw visual inputs. For the first time, that integrates the learning of two key components: communication and visual perception, in one end-to-end framework. More s… ▽ More

    Submitted 10 December, 2020; v1 submitted 6 February, 2020; originally announced February 2020.

  27. Multiple string LED driver with flexible and high performance PWM dimming control

    Authors: M. Tahan, T. Hu

    Abstract: The main objectives in driving multiple LED strings include achieving uniform current control and high performance PWM dimming for all strings. This work proposes a new multiple string LED driver to achieve not only current balance, but also flexible and wide range PWM dimming ratio for each string. A compact single-inductor multiple-output topology is adopted in the driver, accompanied by synchro… ▽ More

    Submitted 31 January, 2020; originally announced February 2020.

    Comments: 14 pages, 28 figures

    Journal ref: IEEE Transactions on Power Electronics ( Volume: 32 , Issue: 12 , Dec. 2017 )

  28. arXiv:1910.11702  [pdf

    q-bio.QM eess.SP

    Screening for REM Sleep Behaviour Disorder with Minimal Sensors

    Authors: Navin Cooray, Fernando Andreotti, Christine Lo, Mkael Symmonds, Michele T. M. Hu, Maarten De Vos

    Abstract: Rapid-Eye-Movement (REM) sleep behaviour disorder (RBD) is an early predictor of Parkinson's disease, dementia with Lewy bodies, and multiple system atrophy. This study investigates a minimal set of sensors to achieve effective screening for RBD in the population, integrating automated sleep staging (three state) followed by RBD detection without the need for cumbersome electroencephalogram (EEG)… ▽ More

    Submitted 24 October, 2019; originally announced October 2019.

    Comments: 21 pages, 6 figures, and 6 tables. arXiv admin note: text overlap with arXiv:1811.04662

  29. arXiv:1909.12472  [pdf

    eess.SP cs.CV cs.LG

    A Radio Signal Modulation Recognition Algorithm Based on Residual Networks and Attention Mechanisms

    Authors: Ruisen Luo, Tao Hu, Zuodong Tang, Chen Wang, Xiaofeng Gong, Haiyan Tu

    Abstract: To solve the problem of inaccurate recognition of types of communication signal modulation, a RNN neural network recognition algorithm combining residual block network with attention mechanism is proposed. In this method, 10 kinds of communication signals with Gaussian white noise are generated from standard data sets, such as MASK, MPSK, MFSK, OFDM, 16QAM, AM and FM. Based on the original RNN neu… ▽ More

    Submitted 26 September, 2019; originally announced September 2019.

  30. arXiv:1811.04662  [pdf

    cs.LG eess.SP q-bio.NC stat.ML

    Detection of REM Sleep Behaviour Disorder by Automated Polysomnography Analysis

    Authors: Navin Cooray, Fernando Andreotti, Christine Lo, Mkael Symmonds, Michele T. M. Hu, Maarten De Vos

    Abstract: Evidence suggests Rapid-Eye-Movement (REM) Sleep Behaviour Disorder (RBD) is an early predictor of Parkinson's disease. This study proposes a fully-automated framework for RBD detection consisting of automated sleep staging followed by RBD identification. Analysis was assessed using a limited polysomnography montage from 53 participants with RBD and 53 age-matched healthy controls. Sleep stage cla… ▽ More

    Submitted 12 November, 2018; originally announced November 2018.

    Comments: 20 pages, 3 figures