subscribe to arXiv mailings

Improving Text-To-Audio Models with Synthetic Captions

Authors: Zhifeng Kong, Sang-gil Lee, Deepanway Ghosal, Navonil Majumder, Ambuj Mehrish, Rafael Valle, Soujanya Poria, Bryan Catanzaro

Abstract: It is an open challenge to obtain high quality training data, especially captions, for text-to-audio models. Although prior methods have leveraged \textit{text-only language models} to augment and improve captions, such methods have limitations related to scale and coherence between audio and captions. In this work, we propose an audio captioning pipeline that uses an \textit{audio language model}… ▽ More It is an open challenge to obtain high quality training data, especially captions, for text-to-audio models. Although prior methods have leveraged \textit{text-only language models} to augment and improve captions, such methods have limitations related to scale and coherence between audio and captions. In this work, we propose an audio captioning pipeline that uses an \textit{audio language model} to synthesize accurate and diverse captions for audio at scale. We leverage this pipeline to produce a dataset of synthetic captions for AudioSet, named \texttt{AF-AudioSet}, and then evaluate the benefit of pre-training text-to-audio models on these synthetic captions. Through systematic evaluations on AudioCaps and MusicCaps, we find leveraging our pipeline and synthetic captions leads to significant improvements on audio generation quality, achieving a new \textit{state-of-the-art}. △ Less

Submitted 8 July, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

arXiv:2404.07616 [pdf, other]

Audio Dialogues: Dialogues dataset for audio and music understanding

Authors: Arushi Goel, Zhifeng Kong, Rafael Valle, Bryan Catanzaro

Abstract: Existing datasets for audio understanding primarily focus on single-turn interactions (i.e. audio captioning, audio question answering) for describing audio in natural language, thus limiting understanding audio via interactive dialogue. To address this gap, we introduce Audio Dialogues: a multi-turn dialogue dataset containing 163.8k samples for general audio sounds and music. In addition to dial… ▽ More Existing datasets for audio understanding primarily focus on single-turn interactions (i.e. audio captioning, audio question answering) for describing audio in natural language, thus limiting understanding audio via interactive dialogue. To address this gap, we introduce Audio Dialogues: a multi-turn dialogue dataset containing 163.8k samples for general audio sounds and music. In addition to dialogues, Audio Dialogues also has question-answer pairs to understand and compare multiple input audios together. Audio Dialogues leverages a prompting-based approach and caption annotations from existing datasets to generate multi-turn dialogues using a Large Language Model (LLM). We evaluate existing audio-augmented large language models on our proposed dataset to demonstrate the complexity and applicability of Audio Dialogues. Our code for generating the dataset will be made publicly available. Detailed prompts and generated dialogues can be found on the demo website https://audiodialogues.github.io/. △ Less

Submitted 11 April, 2024; originally announced April 2024.

Comments: Demo website: https://audiodialogues.github.io/

arXiv:2402.08235 [pdf, other]

Color Image Denoising Using The Green Channel Prior

Authors: Zhaoming Kong, Xiaowei Yang

Abstract: Noise removal in the standard RGB (sRGB) space remains a challenging task, in that the noise statistics of real-world images can be different in R, G and B channels. In fact, the green channel usually has twice the sampling rate in raw data and a higher signal-to-noise ratio than red/blue ones. However, the green channel prior (GCP) is often understated or ignored in color image denoising since ma… ▽ More Noise removal in the standard RGB (sRGB) space remains a challenging task, in that the noise statistics of real-world images can be different in R, G and B channels. In fact, the green channel usually has twice the sampling rate in raw data and a higher signal-to-noise ratio than red/blue ones. However, the green channel prior (GCP) is often understated or ignored in color image denoising since many existing approaches mainly focus on modeling the relationship among image patches. In this paper, we propose a simple and effective one step GCP-based image denoising (GCP-ID) method, which aims to exploit the GCP for denoising in the sRGB space by integrating it into the classic nonlocal transform domain denoising framework. Briefly, we first take advantage of the green channel to guide the search of similar patches, which improves the patch search quality and encourages sparsity in the transform domain. Then we reformulate RGB patches into RGGB arrays to explicitly characterize the density of green samples. The block circulant representation is utilized to capture the cross-channel correlation and the channel redundancy. Experiments on both synthetic and real-world datasets demonstrate the competitive performance of the proposed GCP-ID method for the color image and video denoising tasks. The code is available at github.com/ZhaomingKong/GCP-ID. △ Less

Submitted 13 February, 2024; originally announced February 2024.

arXiv:2402.01831 [pdf, other]

Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities

Authors: Zhifeng Kong, Arushi Goel, Rohan Badlani, Wei Ping, Rafael Valle, Bryan Catanzaro

Abstract: Augmenting large language models (LLMs) to understand audio -- including non-speech sounds and non-verbal speech -- is critically important for diverse real-world applications of LLMs. In this paper, we propose Audio Flamingo, a novel audio language model with 1) strong audio understanding abilities, 2) the ability to quickly adapt to unseen tasks via in-context learning and retrieval, and 3) stro… ▽ More Augmenting large language models (LLMs) to understand audio -- including non-speech sounds and non-verbal speech -- is critically important for diverse real-world applications of LLMs. In this paper, we propose Audio Flamingo, a novel audio language model with 1) strong audio understanding abilities, 2) the ability to quickly adapt to unseen tasks via in-context learning and retrieval, and 3) strong multi-turn dialogue abilities. We introduce a series of training techniques, architecture design, and data strategies to enhance our model with these abilities. Extensive evaluations across various audio understanding tasks confirm the efficacy of our method, setting new state-of-the-art benchmarks. Our demo website is https://audioflamingo.github.io/ and the code is open-sourced at https://github.com/NVIDIA/audio-flamingo. △ Less

Submitted 28 May, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

Comments: ICML 2024

arXiv:2309.05975 [pdf, other]

doi 10.21437/Interspeech.2023-1287

CleanUNet 2: A Hybrid Speech Denoising Model on Waveform and Spectrogram

Authors: Zhifeng Kong, Wei Ping, Ambrish Dantrey, Bryan Catanzaro

Abstract: In this work, we present CleanUNet 2, a speech denoising model that combines the advantages of waveform denoiser and spectrogram denoiser and achieves the best of both worlds. CleanUNet 2 uses a two-stage framework inspired by popular speech synthesis methods that consist of a waveform model and a spectrogram model. Specifically, CleanUNet 2 builds upon CleanUNet, the state-of-the-art waveform den… ▽ More In this work, we present CleanUNet 2, a speech denoising model that combines the advantages of waveform denoiser and spectrogram denoiser and achieves the best of both worlds. CleanUNet 2 uses a two-stage framework inspired by popular speech synthesis methods that consist of a waveform model and a spectrogram model. Specifically, CleanUNet 2 builds upon CleanUNet, the state-of-the-art waveform denoiser, and further boosts its performance by taking predicted spectrograms from a spectrogram denoiser as the input. We demonstrate that CleanUNet 2 outperforms previous methods in terms of various objective and subjective evaluations. △ Less

Submitted 12 September, 2023; originally announced September 2023.

Comments: INTERSPEECH 2023

Journal ref: Proc. INTERSPEECH 2023, pages 790--794

arXiv:2304.08990 [pdf, other]

A Comparison of Image Denoising Methods

Authors: Zhaoming Kong, Fangxi Deng, Haomin Zhuang, Jun Yu, Lifang He, Xiaowei Yang

Abstract: The advancement of imaging devices and countless images generated everyday pose an increasingly high demand on image denoising, which still remains a challenging task in terms of both effectiveness and efficiency. To improve denoising quality, numerous denoising techniques and approaches have been proposed in the past decades, including different transforms, regularization terms, algebraic represe… ▽ More The advancement of imaging devices and countless images generated everyday pose an increasingly high demand on image denoising, which still remains a challenging task in terms of both effectiveness and efficiency. To improve denoising quality, numerous denoising techniques and approaches have been proposed in the past decades, including different transforms, regularization terms, algebraic representations and especially advanced deep neural network (DNN) architectures. Despite their sophistication, many methods may fail to achieve desirable results for simultaneous noise removal and fine detail preservation. In this paper, to investigate the applicability of existing denoising techniques, we compare a variety of denoising methods on both synthetic and real-world datasets for different applications. We also introduce a new dataset for benchmarking, and the evaluations are performed from four different perspectives including quantitative metrics, visual effects, human ratings and computational cost. Our experiments demonstrate: (i) the effectiveness and efficiency of representative traditional denoisers for various denoising tasks, (ii) a simple matrix-based algorithm may be able to produce similar results compared with its tensor counterparts, and (iii) the notable achievements of DNN models, which exhibit impressive generalization ability and show state-of-the-art performance on various datasets. In spite of the progress in recent years, we discuss shortcomings and possible extensions of existing techniques. Datasets, code and results are made publicly available and will be continuously updated at https://github.com/ZhaomingKong/Denoising-Comparison. △ Less

Submitted 9 May, 2023; v1 submitted 18 April, 2023; originally announced April 2023.

Comments: In this paper, we intend to collect and compare various denoising methods to investigate their effectiveness, efficiency, applicability and generalization ability with both synthetic and real-world experiments. arXiv admin note: substantial text overlap with arXiv:2011.03462

arXiv:2301.01732 [pdf, ps, other]

Explicit Abnormality Extraction for Unsupervised Motion Artifact Reduction in Magnetic Resonance Imaging

Authors: Yusheng Zhou, Hao Li, Jianan Liu, Zhengmin Kong, Tao Huang, Euijoon Ahn, Zhihan Lv, Jinman Kim, David Dagan Feng

Abstract: Motion artifacts compromise the quality of magnetic resonance imaging (MRI) and pose challenges to achieving diagnostic outcomes and image-guided therapies. In recent years, supervised deep learning approaches have emerged as successful solutions for motion artifact reduction (MAR). One disadvantage of these methods is their dependency on acquiring paired sets of motion artifact-corrupted (MA-corr… ▽ More Motion artifacts compromise the quality of magnetic resonance imaging (MRI) and pose challenges to achieving diagnostic outcomes and image-guided therapies. In recent years, supervised deep learning approaches have emerged as successful solutions for motion artifact reduction (MAR). One disadvantage of these methods is their dependency on acquiring paired sets of motion artifact-corrupted (MA-corrupted) and motion artifact-free (MA-free) MR images for training purposes. Obtaining such image pairs is difficult and therefore limits the application of supervised training. In this paper, we propose a novel UNsupervised Abnormality Extraction Network (UNAEN) to alleviate this problem. Our network is capable of working with unpaired MA-corrupted and MA-free images. It converts the MA-corrupted images to MA-reduced images by extracting abnormalities from the MA-corrupted images using a proposed artifact extractor, which intercepts the residual artifact maps from the MA-corrupted MR images explicitly, and a reconstructor to restore the original input from the MA-reduced images. The performance of UNAEN was assessed by experimenting with various publicly available MRI datasets and comparing them with state-of-the-art methods. The quantitative evaluation demonstrates the superiority of UNAEN over alternative MAR methods and visually exhibits fewer residual artifacts. Our results substantiate the potential of UNAEN as a promising solution applicable in real-world clinical environments, with the capability to enhance diagnostic accuracy and facilitate image-guided therapies. Our codes are publicly available at https://github.com/YuSheng-Zhou/UNAEN. △ Less

Submitted 5 July, 2024; v1 submitted 4 January, 2023; originally announced January 2023.

arXiv:2202.07790 [pdf, other]

Speech Denoising in the Waveform Domain with Self-Attention

Authors: Zhifeng Kong, Wei Ping, Ambrish Dantrey, Bryan Catanzaro

Abstract: In this work, we present CleanUNet, a causal speech denoising model on the raw waveform. The proposed model is based on an encoder-decoder architecture combined with several self-attention blocks to refine its bottleneck representations, which is crucial to obtain good results. The model is optimized through a set of losses defined over both waveform and multi-resolution spectrograms. The proposed… ▽ More In this work, we present CleanUNet, a causal speech denoising model on the raw waveform. The proposed model is based on an encoder-decoder architecture combined with several self-attention blocks to refine its bottleneck representations, which is crucial to obtain good results. The model is optimized through a set of losses defined over both waveform and multi-resolution spectrograms. The proposed method outperforms the state-of-the-art models in terms of denoised speech quality from various objective and subjective evaluation metrics. We release our code and models at https://github.com/nvidia/cleanunet. △ Less

Submitted 6 July, 2022; v1 submitted 15 February, 2022; originally announced February 2022.

Comments: Published in ICASSP 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Listen to audio samples from CleanUNet at: https://cleanunet.github.io/

arXiv:2111.10803 [pdf, other]

Structure-Preserving Graph Kernel for Brain Network Classification

Authors: Jun Yu, Zhaoming Kong, Aditya Kendre, Hao Peng, Carl Yang, Lichao Sun, Alex Leow, Lifang He

Abstract: This paper presents a novel graph-based kernel learning approach for connectome analysis. Specifically, we demonstrate how to leverage the naturally available structure within the graph representation to encode prior knowledge in the kernel. We first proposed a matrix factorization to directly extract structural features from natural symmetric graph representations of connectome data. We then used… ▽ More This paper presents a novel graph-based kernel learning approach for connectome analysis. Specifically, we demonstrate how to leverage the naturally available structure within the graph representation to encode prior knowledge in the kernel. We first proposed a matrix factorization to directly extract structural features from natural symmetric graph representations of connectome data. We then used them to derive a structure-persevering graph kernel to be fed into the support vector machine. The proposed approach has the advantage of being clinically interpretable. Quantitative evaluations on challenging HIV disease classification (DTI- and fMRI-derived connectome data) and emotion recognition (EEG-derived connectome data) tasks demonstrate the superior performance of our proposed methods against the state-of-the-art. Results showed that relevant EEG-connectome information is primarily encoded in the alpha band during the emotion regulation task. △ Less

Submitted 21 February, 2022; v1 submitted 21 November, 2021; originally announced November 2021.

arXiv:2011.03462 [pdf, other]

A Comprehensive Comparison of Multi-Dimensional Image Denoising Methods

Authors: Zhaoming Kong, Xiaowei Yang, Lifang He

Abstract: Filtering multi-dimensional images such as color images, color videos, multispectral images and magnetic resonance images is challenging in terms of both effectiveness and efficiency. Leveraging the nonlocal self-similarity (NLSS) characteristic of images and sparse representation in the transform domain, the block-matching and 3D filtering (BM3D) based methods show powerful denoising performance.… ▽ More Filtering multi-dimensional images such as color images, color videos, multispectral images and magnetic resonance images is challenging in terms of both effectiveness and efficiency. Leveraging the nonlocal self-similarity (NLSS) characteristic of images and sparse representation in the transform domain, the block-matching and 3D filtering (BM3D) based methods show powerful denoising performance. Recently, numerous new approaches with different regularization terms, transforms and advanced deep neural network (DNN) architectures are proposed to improve denoising quality. In this paper, we extensively compare over 60 methods on both synthetic and real-world datasets. We also introduce a new color image and video dataset for benchmarking, and our evaluations are performed from four different perspectives including quantitative metrics, visual effects, human ratings and computational cost. Comprehensive experiments demonstrate: (i) the effectiveness and efficiency of the BM3D family for various denoising tasks, (ii) a simple matrix-based algorithm could produce similar results compared with its tensor counterparts, and (iii) several DNN models trained with synthetic Gaussian noise show state-of-the-art performance on real-world color image and video datasets. Despite the progress in recent years, we discuss shortcomings and possible extensions of existing techniques. Datasets and codes for evaluation are made publicly available at https://github.com/ZhaomingKong/Denoising-Comparison. △ Less

Submitted 6 November, 2020; originally announced November 2020.

arXiv:2009.09761 [pdf, other]

DiffWave: A Versatile Diffusion Model for Audio Synthesis

Authors: Zhifeng Kong, Wei Ping, Jiaji Huang, Kexin Zhao, Bryan Catanzaro

Abstract: In this work, we propose DiffWave, a versatile diffusion probabilistic model for conditional and unconditional waveform generation. The model is non-autoregressive, and converts the white noise signal into structured waveform through a Markov chain with a constant number of steps at synthesis. It is efficiently trained by optimizing a variant of variational bound on the data likelihood. DiffWave p… ▽ More In this work, we propose DiffWave, a versatile diffusion probabilistic model for conditional and unconditional waveform generation. The model is non-autoregressive, and converts the white noise signal into structured waveform through a Markov chain with a constant number of steps at synthesis. It is efficiently trained by optimizing a variant of variational bound on the data likelihood. DiffWave produces high-fidelity audios in different waveform generation tasks, including neural vocoding conditioned on mel spectrogram, class-conditional generation, and unconditional generation. We demonstrate that DiffWave matches a strong WaveNet vocoder in terms of speech quality (MOS: 4.44 versus 4.43), while synthesizing orders of magnitude faster. In particular, it significantly outperforms autoregressive and GAN-based waveform models in the challenging unconditional generation task in terms of audio quality and sample diversity from various automatic and human evaluations. △ Less

Submitted 30 March, 2021; v1 submitted 21 September, 2020; originally announced September 2020.

Comments: ICLR 2021 (oral)

arXiv:1801.09289 [pdf, other]

Data-Driven Approximate Abstraction for Black-Box Piecewise Affine Systems

Authors: Gang Chen, Zhaodan Kong

Abstract: How to effectively and reliably guarantee the correct functioning of safety-critical cyber-physical systems in uncertain conditions is a challenging problem. This paper presents a data-driven algorithm to derive approximate abstractions for piecewise affine systems with unknown dynamics. It advocates a significant shift from the current paradigm of abstraction, which starts from a model with known… ▽ More How to effectively and reliably guarantee the correct functioning of safety-critical cyber-physical systems in uncertain conditions is a challenging problem. This paper presents a data-driven algorithm to derive approximate abstractions for piecewise affine systems with unknown dynamics. It advocates a significant shift from the current paradigm of abstraction, which starts from a model with known dynamics. Given a black-box system with unknown dynamics and a linear temporal logic specification, the proposed algorithm is able to obtain an abstraction of the system with an arbitrarily small error and a bounded probability. The algorithm consists of three components, system identification, system abstraction, and active sampling. The effectiveness of the algorithm is demonstrated by a case study with a soft robot. △ Less

Submitted 30 January, 2018; v1 submitted 28 January, 2018; originally announced January 2018.

arXiv:1609.07409 [pdf, other]

Q-Learning for Robust Satisfaction of Signal Temporal Logic Specifications

Authors: Derya Aksaray, Austin Jones, Zhaodan Kong, Mac Schwager, Calin Belta

Abstract: This paper addresses the problem of learning optimal policies for satisfying signal temporal logic (STL) specifications by agents with unknown stochastic dynamics. The system is modeled as a Markov decision process, in which the states represent partitions of a continuous space and the transition probabilities are unknown. We formulate two synthesis problems where the desired STL specification is… ▽ More This paper addresses the problem of learning optimal policies for satisfying signal temporal logic (STL) specifications by agents with unknown stochastic dynamics. The system is modeled as a Markov decision process, in which the states represent partitions of a continuous space and the transition probabilities are unknown. We formulate two synthesis problems where the desired STL specification is enforced by maximizing the probability of satisfaction, and the expected robustness degree, that is, a measure quantifying the quality of satisfaction. We discuss that Q-learning is not directly applicable to these problems because, based on the quantitative semantics of STL, the probability of satisfaction and expected robustness degree are not in the standard objective form of Q-learning. To resolve this issue, we propose an approximation of STL synthesis problems that can be solved via Q-learning, and we derive some performance bounds for the policies obtained by the approximate approach. The performance of the proposed method is demonstrated via simulations. △ Less

Submitted 23 September, 2016; originally announced September 2016.

Comments: This paper is accepted to IEEE CDC 2016

arXiv:1603.00814 [pdf, other]

Active Requirement Mining of Bounded-Time Temporal Properties of Cyber-Physical Systems

Authors: Gang Chen, Zachary Sabato, Zhaodan Kong

Abstract: This paper uses active learning to solve the problem of mining bounded-time signal temporal requirements of cyber-physical systems or simply the requirement mining problem. By utilizing robustness degree, we formulates the requirement mining problem into two optimization problems, a parameter synthesis problem and a falsification problem. We then propose a new active learning algorithm called Gaus… ▽ More This paper uses active learning to solve the problem of mining bounded-time signal temporal requirements of cyber-physical systems or simply the requirement mining problem. By utilizing robustness degree, we formulates the requirement mining problem into two optimization problems, a parameter synthesis problem and a falsification problem. We then propose a new active learning algorithm called Gaussian Process Adaptive Confidence Bound (GP-ACB) to help solving the falsification problem. We show theoretically that the GP-ACB algorithm has a lower regret bound thus a larger convergence rate than some existing active learning algorithms, such as GP-UCB. We finally illustrate and apply our requirement mining algorithm on two case studies, the Ackley's function and a real world automatic transmission model. The case studies show that our mining algorithm with GP-ACB outperforms others, such as those based on Nelder-Mead, by an average of 30% to 40%. Our results demonstrate that there is a principled and efficient way of extracting requirements for complex cyber-physical systems. △ Less

Submitted 2 March, 2016; originally announced March 2016.

arXiv:1510.06460 [pdf, other]

Robust Satisfaction of Temporal Logic Specifications via Reinforcement Learning

Authors: Austin Jones, Derya Aksaray, Zhaodan Kong, Mac Schwager, Calin Belta

Abstract: We consider the problem of steering a system with unknown, stochastic dynamics to satisfy a rich, temporally layered task given as a signal temporal logic formula. We represent the system as a Markov decision process in which the states are built from a partition of the state space and the transition probabilities are unknown. We present provably convergent reinforcement learning algorithms to max… ▽ More We consider the problem of steering a system with unknown, stochastic dynamics to satisfy a rich, temporally layered task given as a signal temporal logic formula. We represent the system as a Markov decision process in which the states are built from a partition of the state space and the transition probabilities are unknown. We present provably convergent reinforcement learning algorithms to maximize the probability of satisfying a given formula and to maximize the average expected robustness, i.e., a measure of how strongly the formula is satisfied. We demonstrate via a pair of robot navigation simulation case studies that reinforcement learning with robustness maximization performs better than probability maximization in terms of both probability of satisfaction and expected robustness. △ Less

Submitted 21 October, 2015; originally announced October 2015.

Comments: 8 pages, 4 figures

arXiv:1403.5462 [pdf, ps, other]

Saliency Based Control in Random Feature Networks

Authors: John Baillieul, Zhaodan Kong

Abstract: The ability to rapidly focus attention and react to salient environmental features enables animals to move agiley through their habitats. To replicate this kind of high-performance control of movement in synthetic systems, we propose a new approach to feedback control that bases control actions on randomly perceived features. Connections will be made with recent work incorporating communication pr… ▽ More The ability to rapidly focus attention and react to salient environmental features enables animals to move agiley through their habitats. To replicate this kind of high-performance control of movement in synthetic systems, we propose a new approach to feedback control that bases control actions on randomly perceived features. Connections will be made with recent work incorporating communication protocols into networked control systems. The concepts of {\em random channel controllability} and {\em random channel observability} for LTI control systems are introduced and studied. △ Less

Submitted 6 August, 2014; v1 submitted 21 March, 2014; originally announced March 2014.

Comments: 9 pages, 2 figures

arXiv:1311.4419 [pdf, other]

Perception and Steering Control in Paired Bat Flight

Authors: Zhaodan Kong, Kayhan Ozcimder, Nathan W. Fuller, John Baillieul

Abstract: Animals within groups need to coordinate their reactions to perceived environmental features and to each other in order to safely move from one point to another. This paper extends our previously published work on the flight patterns of Myotis velifer that have been observed in a habitat near Johnson City, Texas. Each evening, these bats emerge from a cave in sequences of small groups that typical… ▽ More Animals within groups need to coordinate their reactions to perceived environmental features and to each other in order to safely move from one point to another. This paper extends our previously published work on the flight patterns of Myotis velifer that have been observed in a habitat near Johnson City, Texas. Each evening, these bats emerge from a cave in sequences of small groups that typically contain no more than three or four individuals, and they thus provide ideal subjects for studying leader-follower behaviors. By analyzing the flight paths of a group of M. velifer, the data show that the flight behavior of a follower bat is influenced by the flight behavior of a leader bat in a way that is not well explained by existing pursuit laws, such as classical pursuit, constant bearing and motion camouflage. Thus we propose an alternative steering law based on virtual loom, a concept we introduce to capture the geometrical configuration of the leader-follower pair. It is shown that this law may be integrated with our previously proposed vision-enabled steering laws to synthesize trajectories, the statistics of which fit with those of the bats in our data set. The results suggest that bats use perceived information of both the environment and their neighbors for navigation. △ Less

Submitted 15 November, 2013; originally announced November 2013.

Comments: Submitted to the 19th World Congress of the International Federation of Automatic Control (IFAC)

arXiv:1303.3072 [pdf, other]

Optical Flow Sensing and the Inverse Perception Problem for Flying Bats

Authors: Zhaodan Kong, Kayhan Özcimder, Nathan Fuller, Alison Greco, Diane Theriault, Zheng Wu, Thomas Kunz, Margrit Betke, John Baillieul

Abstract: The movements of birds, bats, and other flying species are governed by complex sensorimotor systems that allow the animals to react to stationary environmental features as well as to wind disturbances, other animals in nearby airspace, and a wide variety of unexpected challenges. The paper and talk will describe research that analyzes the three-dimensional trajectories of bats flying in a habitat… ▽ More The movements of birds, bats, and other flying species are governed by complex sensorimotor systems that allow the animals to react to stationary environmental features as well as to wind disturbances, other animals in nearby airspace, and a wide variety of unexpected challenges. The paper and talk will describe research that analyzes the three-dimensional trajectories of bats flying in a habitat in Texas. The trajectories are computed with stereoscopic methods using data from synchronous thermal videos that were recorded with high temporal and spatial resolution from three viewpoints. Following our previously reported work, we examine the possibility that bat trajectories in this habitat are governed by optical flow sensing that interpolates periodic distance measurements from echolocation. Using an idealized geometry of bat eyes, we introduce the concept of time-to-transit, and recall some research that suggests that this quantity is computed by the animals' visual cortex. Several steering control laws based on time-to-transit are proposed for an idealized flight model, and it is shown that these can be used to replicate the observed flight of what we identify as typical bats. Although the vision-based motion control laws we propose and the protocols for switching between them are quite simple, some of the trajectories that have been synthesized are qualitatively bat-like. Examination of the control protocols that generate these trajectories suggests that bat motions are governed both by their reactions to a subset of key feature points as well by their memories of where these feature points are located. △ Less

Submitted 12 March, 2013; originally announced March 2013.

Comments: 20 Pages, 7 figures

Showing 1–18 of 18 results for author: Kong, Z