subscribe to arXiv mailings

Real-time Timbre Remapping with Differentiable DSP

Authors: Jordie Shier, Charalampos Saitis, Andrew Robertson, Andrew McPherson

Abstract: Timbre is a primary mode of expression in diverse musical contexts. However, prevalent audio-driven synthesis methods predominantly rely on pitch and loudness envelopes, effectively flattening timbral expression from the input. Our approach draws on the concept of timbre analogies and investigates how timbral expression from an input signal can be mapped onto controls for a synthesizer. Leveraging… ▽ More Timbre is a primary mode of expression in diverse musical contexts. However, prevalent audio-driven synthesis methods predominantly rely on pitch and loudness envelopes, effectively flattening timbral expression from the input. Our approach draws on the concept of timbre analogies and investigates how timbral expression from an input signal can be mapped onto controls for a synthesizer. Leveraging differentiable digital signal processing, our method facilitates direct optimization of synthesizer parameters through a novel feature difference loss. This loss function, designed to learn relative timbral differences between musical events, prioritizes the subtleties of graded timbre modulations within phrases, allowing for meaningful translations in a timbre space. Using snare drum performances as a case study, where timbral expression is central, we demonstrate real-time timbre remapping from acoustic snare drums to a differentiable synthesizer modeled after the Roland TR-808. △ Less

Submitted 5 July, 2024; originally announced July 2024.

Comments: Accepted for publication at the 24th International Conference on New Interfaces for Musical Expression in Utrecht, Netherlands

arXiv:2310.04811 [pdf, other]

FM Tone Transfer with Envelope Learning

Authors: Franco Caspe, Andrew McPherson, Mark Sandler

Abstract: Tone Transfer is a novel deep-learning technique for interfacing a sound source with a synthesizer, transforming the timbre of audio excerpts while keeping their musical form content. Due to its good audio quality results and continuous controllability, it has been recently applied in several audio processing tools. Nevertheless, it still presents several shortcomings related to poor sound diversi… ▽ More Tone Transfer is a novel deep-learning technique for interfacing a sound source with a synthesizer, transforming the timbre of audio excerpts while keeping their musical form content. Due to its good audio quality results and continuous controllability, it has been recently applied in several audio processing tools. Nevertheless, it still presents several shortcomings related to poor sound diversity, and limited transient and dynamic rendering, which we believe hinder its possibilities of articulation and phrasing in a real-time performance context. In this work, we present a discussion on current Tone Transfer architectures for the task of controlling synthetic audio with musical instruments and discuss their challenges in allowing expressive performances. Next, we introduce Envelope Learning, a novel method for designing Tone Transfer architectures that map musical events using a training objective at the synthesis parameter level. Our technique can render note beginnings and endings accurately and for a variety of sounds; these are essential steps for improving musical articulation, phrasing, and sound diversity with Tone Transfer. Finally, we implement a VST plugin for real-time live use and discuss possibilities for improvement. △ Less

Submitted 7 October, 2023; originally announced October 2023.

Comments: Accepted to Audio Mostly 2023

arXiv:2309.06649 [pdf, other]

Differentiable Modelling of Percussive Audio with Transient and Spectral Synthesis

Authors: Jordie Shier, Franco Caspe, Andrew Robertson, Mark Sandler, Charalampos Saitis, Andrew McPherson

Abstract: Differentiable digital signal processing (DDSP) techniques, including methods for audio synthesis, have gained attention in recent years and lend themselves to interpretability in the parameter space. However, current differentiable synthesis methods have not explicitly sought to model the transient portion of signals, which is important for percussive sounds. In this work, we present a unified sy… ▽ More Differentiable digital signal processing (DDSP) techniques, including methods for audio synthesis, have gained attention in recent years and lend themselves to interpretability in the parameter space. However, current differentiable synthesis methods have not explicitly sought to model the transient portion of signals, which is important for percussive sounds. In this work, we present a unified synthesis framework aiming to address transient generation and percussive synthesis within a DDSP framework. To this end, we propose a model for percussive synthesis that builds on sinusoidal modeling synthesis and incorporates a modulated temporal convolutional network for transient generation. We use a modified sinusoidal peak picking algorithm to generate time-varying non-harmonic sinusoids and pair it with differentiable noise and transient encoders that are jointly trained to reconstruct drumset sounds. We compute a set of reconstruction metrics using a large dataset of acoustic and electronic percussion samples that show that our method leads to improved onset signal reconstruction for membranophone percussion instruments. △ Less

Submitted 12 September, 2023; originally announced September 2023.

Comments: To be published in The Proceedings of Forum Acusticum, Sep 2023, Turin, Italy

arXiv:2308.15422 [pdf, other]

A Review of Differentiable Digital Signal Processing for Music & Speech Synthesis

Authors: Ben Hayes, Jordie Shier, György Fazekas, Andrew McPherson, Charalampos Saitis

Abstract: The term "differentiable digital signal processing" describes a family of techniques in which loss function gradients are backpropagated through digital signal processors, facilitating their integration into neural networks. This article surveys the literature on differentiable audio signal processing, focusing on its use in music & speech synthesis. We catalogue applications to tasks including mu… ▽ More The term "differentiable digital signal processing" describes a family of techniques in which loss function gradients are backpropagated through digital signal processors, facilitating their integration into neural networks. This article surveys the literature on differentiable audio signal processing, focusing on its use in music & speech synthesis. We catalogue applications to tasks including music performance rendering, sound matching, and voice transformation, discussing the motivations for and implications of the use of this methodology. This is accompanied by an overview of digital signal processing operations that have been implemented differentiably. Finally, we highlight open challenges, including optimisation pathologies, robustness to real-world conditions, and design trade-offs, and discuss directions for future research. △ Less

Submitted 29 August, 2023; originally announced August 2023.

Comments: Under review for Frontiers in Signal Processing

arXiv:2307.07426 [pdf, other]

Real-time Percussive Technique Recognition and Embedding Learning for the Acoustic Guitar

Authors: Andrea Martelloni, Andrew P McPherson, Mathieu Barthet

Abstract: Real-time music information retrieval (RT-MIR) has much potential to augment the capabilities of traditional acoustic instruments. We develop RT-MIR techniques aimed at augmenting percussive fingerstyle, which blends acoustic guitar playing with guitar body percussion. We formulate several design objectives for RT-MIR systems for augmented instrument performance: (i) causal constraint, (ii) percep… ▽ More Real-time music information retrieval (RT-MIR) has much potential to augment the capabilities of traditional acoustic instruments. We develop RT-MIR techniques aimed at augmenting percussive fingerstyle, which blends acoustic guitar playing with guitar body percussion. We formulate several design objectives for RT-MIR systems for augmented instrument performance: (i) causal constraint, (ii) perceptually negligible action-to-sound latency, (iii) control intimacy support, (iv) synthesis control support. We present and evaluate real-time guitar body percussion recognition and embedding learning techniques based on convolutional neural networks (CNNs) and CNNs jointly trained with variational autoencoders (VAEs). We introduce a taxonomy of guitar body percussion based on hand part and location. We follow a cross-dataset evaluation approach by collecting three datasets labelled according to the taxonomy. The embedding quality of the models is assessed using KL-Divergence across distributions corresponding to different taxonomic classes. Results indicate that the networks are strong classifiers especially in a simplified 2-class recognition task, and the VAEs yield improved class separation compared to CNNs as evidenced by increased KL-Divergence across distributions. We argue that the VAE embedding quality could support control intimacy and rich interaction when the latent space's parameters are used to control an external synthesis engine. Further design challenges around generalisation to different datasets have been identified. △ Less

Submitted 13 July, 2023; originally announced July 2023.

Comments: Accepted at the 24th Int. Society for Music Information Retrieval Conf., Milan, Italy, 2023

arXiv:2306.11389 [pdf, other]

Pipeline for recording datasets and running neural networks on the Bela embedded hardware platform

Authors: Teresa Pelinski, Rodrigo Diaz, Adán L. Benito Temprano, Andrew McPherson

Abstract: Deploying deep learning models on embedded devices is an arduous task: oftentimes, there exist no platform-specific instructions, and compilation times can be considerably large due to the limited computational resources available on-device. Moreover, many music-making applications demand real-time inference. Embedded hardware platforms for audio, such as Bela, offer an entry point for beginners i… ▽ More Deploying deep learning models on embedded devices is an arduous task: oftentimes, there exist no platform-specific instructions, and compilation times can be considerably large due to the limited computational resources available on-device. Moreover, many music-making applications demand real-time inference. Embedded hardware platforms for audio, such as Bela, offer an entry point for beginners into physical audio computing; however, the need for cross-compilation environments and low-level software development tools for deploying embedded deep learning models imposes high entry barriers on non-expert users. We present a pipeline for deploying neural networks in the Bela embedded hardware platform. In our pipeline, we include a tool to record a multichannel dataset of sensor signals. Additionally, we provide a dockerised cross-compilation environment for faster compilation. With this pipeline, we aim to provide a template for programmers and makers to prototype and experiment with neural networks for real-time embedded musical applications. △ Less

Submitted 20 June, 2023; originally announced June 2023.

arXiv:2208.06169 [pdf, other]

DDX7: Differentiable FM Synthesis of Musical Instrument Sounds

Authors: Franco Caspe, Andrew McPherson, Mark Sandler

Abstract: FM Synthesis is a well-known algorithm used to generate complex timbre from a compact set of design primitives. Typically featuring a MIDI interface, it is usually impractical to control it from an audio source. On the other hand, Differentiable Digital Signal Processing (DDSP) has enabled nuanced audio rendering by Deep Neural Networks (DNNs) that learn to control differentiable synthesis layers… ▽ More FM Synthesis is a well-known algorithm used to generate complex timbre from a compact set of design primitives. Typically featuring a MIDI interface, it is usually impractical to control it from an audio source. On the other hand, Differentiable Digital Signal Processing (DDSP) has enabled nuanced audio rendering by Deep Neural Networks (DNNs) that learn to control differentiable synthesis layers from arbitrary sound inputs. The training process involves a corpus of audio for supervision, and spectral reconstruction loss functions. Such functions, while being great to match spectral amplitudes, present a lack of pitch direction which can hinder the joint optimization of the parameters of FM synthesizers. In this paper, we take steps towards enabling continuous control of a well-established FM synthesis architecture from an audio input. Firstly, we discuss a set of design constraints that ease spectral optimization of a differentiable FM synthesizer via a standard reconstruction loss. Next, we present Differentiable DX7 (DDX7), a lightweight architecture for neural FM resynthesis of musical instrument sounds in terms of a compact set of parameters. We train the model on instrument samples extracted from the URMP dataset, and quantitatively demonstrate its comparable audio quality against selected benchmarks. △ Less

Submitted 12 August, 2022; originally announced August 2022.

Comments: Accepted to ISMIR 2022. See online supplement at https://fcaspe.github.io/ddx7/

ACM Class: H.5.5; I.2.6

arXiv:2111.11524 [pdf, other]

doi 10.1109/ICRA46639.2022.9812175

Tenodesis Grasp Emulator: Kinematic Assessment of Wrist-Driven Orthotic Control

Authors: Erin Y. Chang, Raghid Mardini, Andrew I. W. McPherson, Yuri Gloumakov, Hannah S. Stuart

Abstract: Wrist-driven orthotics have been designed to assist people with C6-7 spinal cord injury, however, the kinematic constraint imposed by such a control strategy can impede mobility and lead to abnormal body motion. This study characterizes body compensation using the novel Tenodesis Grasp Emulator, an adaptor orthotic that allows for the investigation of tenodesis grasping in subjects with unimpaired… ▽ More Wrist-driven orthotics have been designed to assist people with C6-7 spinal cord injury, however, the kinematic constraint imposed by such a control strategy can impede mobility and lead to abnormal body motion. This study characterizes body compensation using the novel Tenodesis Grasp Emulator, an adaptor orthotic that allows for the investigation of tenodesis grasping in subjects with unimpaired hand function. Subjects perform a series of grasp-and-release tasks in order to compare normal (test control) and constrained wrist-driven modes, showing significant compensation as a result of the constraint. A motor-augmented mode is also compared against traditional wrist-driven operation, to explore the potential role of hybrid human-robot control. We find that both the passive wrist-driven and motor-augmented modes fulfill different roles throughout various tasks tested. Thus, we conclude that a flexible control scheme that can alter intervention based on the task at hand holds the potential to reduce compensation in future work. △ Less

Submitted 9 November, 2023; v1 submitted 22 November, 2021; originally announced November 2021.

Comments: 7 pages, 11 figures, submitted to International Conference on Robotics and Automation (ICRA) 2022. Video Supplement: https://youtu.be/NIgKg5R3Roc

arXiv:2101.06795 [pdf, other]

An embedded multichannel sound acquisition system for drone audition

Authors: Michael Clayton, Lin Wang, Andrew McPherson, Andrea Cavallaro

Abstract: Microphone array techniques can improve the acoustic sensing performance on drones, compared to the use of a single microphone. However, multichannel sound acquisition systems are not available in current commercial drone platforms. To encourage the research in drone audition, we present an embedded sound acquisition and recording system with eight microphones and a multichannel sound recorder mou… ▽ More Microphone array techniques can improve the acoustic sensing performance on drones, compared to the use of a single microphone. However, multichannel sound acquisition systems are not available in current commercial drone platforms. To encourage the research in drone audition, we present an embedded sound acquisition and recording system with eight microphones and a multichannel sound recorder mounted on a quadcopter. In addition to recording and storing locally the sound from multiple microphones simultaneously, the embedded system can connect wirelessly to a remote terminal to transfer audio files for further processing. This will be the first stage towards creating a fully embedded solution for drone audition. We present experimental results obtained by state-of-the-art drone audition algorithms applied to the sound recorded by the embedded system. △ Less

Submitted 17 January, 2021; originally announced January 2021.

arXiv:1712.06790 [pdf, other]

Build and Execution Environment (BEE): an Encapsulated Environment Enabling HPC Applications Running Everywhere

Authors: Jieyang Chen, Qiang Guan, Xin Liang, Louis James Vernon, Allen McPherson, Li-Ta Lo, Zizhong Chen, James Paul Ahrens

Abstract: Variations in High Performance Computing (HPC) system software configurations mean that applications are typically configured and built for specific HPC environments. Building applications can require a significant investment of time and effort for application users and requires application users to have additional technical knowledge. Linux container technologies such as Docker and Charliecloud b… ▽ More Variations in High Performance Computing (HPC) system software configurations mean that applications are typically configured and built for specific HPC environments. Building applications can require a significant investment of time and effort for application users and requires application users to have additional technical knowledge. Linux container technologies such as Docker and Charliecloud bring great benefits to the application development, build and deployment processes. While cloud platforms already widely support containers, HPC systems still have non-uniform support of container technologies. In this work, we propose a unified runtime framework -- Build and Execution Environment (BEE) across both HPC and cloud platforms that allows users to run their containerized HPC applications across all supported platforms without modification. We design four BEE backends for four different classes of HPC or cloud platform so that together they cover the majority of mainstream computing platforms for HPC users. Evaluations show that BEE provides an easy-to-use unified user interface, execution environment, and comparable performance. △ Less

Submitted 27 February, 2021; v1 submitted 19 December, 2017; originally announced December 2017.

arXiv:1504.03080 [pdf, other]

Joint Inference of Genome Structure and Content in Heterogeneous Tumour Samples

Authors: Andrew McPherson, Andrew Roth, Gavin Ha, Sohrab P. Shah, Cedric Chauve, S. Cenk Sahinalp

Abstract: For a genomically unstable cancer, a single tumour biopsy will often contain a mixture of competing tumour clones. These tumour clones frequently differ with respect to their genomic content (copy number of each gene) and structure (order of genes on each chromosome). Modern bulk genome sequencing mixes the signals of tumour clones and contaminating normal cells, complicating inference of genomic… ▽ More For a genomically unstable cancer, a single tumour biopsy will often contain a mixture of competing tumour clones. These tumour clones frequently differ with respect to their genomic content (copy number of each gene) and structure (order of genes on each chromosome). Modern bulk genome sequencing mixes the signals of tumour clones and contaminating normal cells, complicating inference of genomic content and structure. We propose a method to unmix tumour and contaminating normal signals and jointly predict genomic structure and content of each tumour clone. We use genome graphs to represent tumour clones, and model the likelihood of the observed reads given clones and mixing proportions. Our use of haplotype blocks allows us to accurately measure allele specific read counts, and infer allele specific copy number for each clone. The proposed method is a heuristic local search based on applying incremental, locally optimal modifications of the genome graphs. Using simulated data, we show that our method predicts copy counts and gene adjacencies with reasonable accuracy. △ Less

Submitted 24 April, 2015; v1 submitted 13 April, 2015; originally announced April 2015.

Comments: Presented at RECOMB 2015

Showing 1–11 of 11 results for author: McPherson, A