-
Real-time Timbre Remapping with Differentiable DSP
Authors:
Jordie Shier,
Charalampos Saitis,
Andrew Robertson,
Andrew McPherson
Abstract:
Timbre is a primary mode of expression in diverse musical contexts. However, prevalent audio-driven synthesis methods predominantly rely on pitch and loudness envelopes, effectively flattening timbral expression from the input. Our approach draws on the concept of timbre analogies and investigates how timbral expression from an input signal can be mapped onto controls for a synthesizer. Leveraging…
▽ More
Timbre is a primary mode of expression in diverse musical contexts. However, prevalent audio-driven synthesis methods predominantly rely on pitch and loudness envelopes, effectively flattening timbral expression from the input. Our approach draws on the concept of timbre analogies and investigates how timbral expression from an input signal can be mapped onto controls for a synthesizer. Leveraging differentiable digital signal processing, our method facilitates direct optimization of synthesizer parameters through a novel feature difference loss. This loss function, designed to learn relative timbral differences between musical events, prioritizes the subtleties of graded timbre modulations within phrases, allowing for meaningful translations in a timbre space. Using snare drum performances as a case study, where timbral expression is central, we demonstrate real-time timbre remapping from acoustic snare drums to a differentiable synthesizer modeled after the Roland TR-808.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
FM Tone Transfer with Envelope Learning
Authors:
Franco Caspe,
Andrew McPherson,
Mark Sandler
Abstract:
Tone Transfer is a novel deep-learning technique for interfacing a sound source with a synthesizer, transforming the timbre of audio excerpts while keeping their musical form content. Due to its good audio quality results and continuous controllability, it has been recently applied in several audio processing tools. Nevertheless, it still presents several shortcomings related to poor sound diversi…
▽ More
Tone Transfer is a novel deep-learning technique for interfacing a sound source with a synthesizer, transforming the timbre of audio excerpts while keeping their musical form content. Due to its good audio quality results and continuous controllability, it has been recently applied in several audio processing tools. Nevertheless, it still presents several shortcomings related to poor sound diversity, and limited transient and dynamic rendering, which we believe hinder its possibilities of articulation and phrasing in a real-time performance context.
In this work, we present a discussion on current Tone Transfer architectures for the task of controlling synthetic audio with musical instruments and discuss their challenges in allowing expressive performances. Next, we introduce Envelope Learning, a novel method for designing Tone Transfer architectures that map musical events using a training objective at the synthesis parameter level. Our technique can render note beginnings and endings accurately and for a variety of sounds; these are essential steps for improving musical articulation, phrasing, and sound diversity with Tone Transfer. Finally, we implement a VST plugin for real-time live use and discuss possibilities for improvement.
△ Less
Submitted 7 October, 2023;
originally announced October 2023.
-
Differentiable Modelling of Percussive Audio with Transient and Spectral Synthesis
Authors:
Jordie Shier,
Franco Caspe,
Andrew Robertson,
Mark Sandler,
Charalampos Saitis,
Andrew McPherson
Abstract:
Differentiable digital signal processing (DDSP) techniques, including methods for audio synthesis, have gained attention in recent years and lend themselves to interpretability in the parameter space. However, current differentiable synthesis methods have not explicitly sought to model the transient portion of signals, which is important for percussive sounds. In this work, we present a unified sy…
▽ More
Differentiable digital signal processing (DDSP) techniques, including methods for audio synthesis, have gained attention in recent years and lend themselves to interpretability in the parameter space. However, current differentiable synthesis methods have not explicitly sought to model the transient portion of signals, which is important for percussive sounds. In this work, we present a unified synthesis framework aiming to address transient generation and percussive synthesis within a DDSP framework. To this end, we propose a model for percussive synthesis that builds on sinusoidal modeling synthesis and incorporates a modulated temporal convolutional network for transient generation. We use a modified sinusoidal peak picking algorithm to generate time-varying non-harmonic sinusoids and pair it with differentiable noise and transient encoders that are jointly trained to reconstruct drumset sounds. We compute a set of reconstruction metrics using a large dataset of acoustic and electronic percussion samples that show that our method leads to improved onset signal reconstruction for membranophone percussion instruments.
△ Less
Submitted 12 September, 2023;
originally announced September 2023.
-
A Review of Differentiable Digital Signal Processing for Music & Speech Synthesis
Authors:
Ben Hayes,
Jordie Shier,
György Fazekas,
Andrew McPherson,
Charalampos Saitis
Abstract:
The term "differentiable digital signal processing" describes a family of techniques in which loss function gradients are backpropagated through digital signal processors, facilitating their integration into neural networks. This article surveys the literature on differentiable audio signal processing, focusing on its use in music & speech synthesis. We catalogue applications to tasks including mu…
▽ More
The term "differentiable digital signal processing" describes a family of techniques in which loss function gradients are backpropagated through digital signal processors, facilitating their integration into neural networks. This article surveys the literature on differentiable audio signal processing, focusing on its use in music & speech synthesis. We catalogue applications to tasks including music performance rendering, sound matching, and voice transformation, discussing the motivations for and implications of the use of this methodology. This is accompanied by an overview of digital signal processing operations that have been implemented differentiably. Finally, we highlight open challenges, including optimisation pathologies, robustness to real-world conditions, and design trade-offs, and discuss directions for future research.
△ Less
Submitted 29 August, 2023;
originally announced August 2023.
-
Real-time Percussive Technique Recognition and Embedding Learning for the Acoustic Guitar
Authors:
Andrea Martelloni,
Andrew P McPherson,
Mathieu Barthet
Abstract:
Real-time music information retrieval (RT-MIR) has much potential to augment the capabilities of traditional acoustic instruments. We develop RT-MIR techniques aimed at augmenting percussive fingerstyle, which blends acoustic guitar playing with guitar body percussion. We formulate several design objectives for RT-MIR systems for augmented instrument performance: (i) causal constraint, (ii) percep…
▽ More
Real-time music information retrieval (RT-MIR) has much potential to augment the capabilities of traditional acoustic instruments. We develop RT-MIR techniques aimed at augmenting percussive fingerstyle, which blends acoustic guitar playing with guitar body percussion. We formulate several design objectives for RT-MIR systems for augmented instrument performance: (i) causal constraint, (ii) perceptually negligible action-to-sound latency, (iii) control intimacy support, (iv) synthesis control support. We present and evaluate real-time guitar body percussion recognition and embedding learning techniques based on convolutional neural networks (CNNs) and CNNs jointly trained with variational autoencoders (VAEs). We introduce a taxonomy of guitar body percussion based on hand part and location. We follow a cross-dataset evaluation approach by collecting three datasets labelled according to the taxonomy. The embedding quality of the models is assessed using KL-Divergence across distributions corresponding to different taxonomic classes. Results indicate that the networks are strong classifiers especially in a simplified 2-class recognition task, and the VAEs yield improved class separation compared to CNNs as evidenced by increased KL-Divergence across distributions. We argue that the VAE embedding quality could support control intimacy and rich interaction when the latent space's parameters are used to control an external synthesis engine. Further design challenges around generalisation to different datasets have been identified.
△ Less
Submitted 13 July, 2023;
originally announced July 2023.
-
Pipeline for recording datasets and running neural networks on the Bela embedded hardware platform
Authors:
Teresa Pelinski,
Rodrigo Diaz,
Adán L. Benito Temprano,
Andrew McPherson
Abstract:
Deploying deep learning models on embedded devices is an arduous task: oftentimes, there exist no platform-specific instructions, and compilation times can be considerably large due to the limited computational resources available on-device. Moreover, many music-making applications demand real-time inference. Embedded hardware platforms for audio, such as Bela, offer an entry point for beginners i…
▽ More
Deploying deep learning models on embedded devices is an arduous task: oftentimes, there exist no platform-specific instructions, and compilation times can be considerably large due to the limited computational resources available on-device. Moreover, many music-making applications demand real-time inference. Embedded hardware platforms for audio, such as Bela, offer an entry point for beginners into physical audio computing; however, the need for cross-compilation environments and low-level software development tools for deploying embedded deep learning models imposes high entry barriers on non-expert users. We present a pipeline for deploying neural networks in the Bela embedded hardware platform. In our pipeline, we include a tool to record a multichannel dataset of sensor signals. Additionally, we provide a dockerised cross-compilation environment for faster compilation. With this pipeline, we aim to provide a template for programmers and makers to prototype and experiment with neural networks for real-time embedded musical applications.
△ Less
Submitted 20 June, 2023;
originally announced June 2023.
-
DDX7: Differentiable FM Synthesis of Musical Instrument Sounds
Authors:
Franco Caspe,
Andrew McPherson,
Mark Sandler
Abstract:
FM Synthesis is a well-known algorithm used to generate complex timbre from a compact set of design primitives. Typically featuring a MIDI interface, it is usually impractical to control it from an audio source. On the other hand, Differentiable Digital Signal Processing (DDSP) has enabled nuanced audio rendering by Deep Neural Networks (DNNs) that learn to control differentiable synthesis layers…
▽ More
FM Synthesis is a well-known algorithm used to generate complex timbre from a compact set of design primitives. Typically featuring a MIDI interface, it is usually impractical to control it from an audio source. On the other hand, Differentiable Digital Signal Processing (DDSP) has enabled nuanced audio rendering by Deep Neural Networks (DNNs) that learn to control differentiable synthesis layers from arbitrary sound inputs. The training process involves a corpus of audio for supervision, and spectral reconstruction loss functions. Such functions, while being great to match spectral amplitudes, present a lack of pitch direction which can hinder the joint optimization of the parameters of FM synthesizers. In this paper, we take steps towards enabling continuous control of a well-established FM synthesis architecture from an audio input. Firstly, we discuss a set of design constraints that ease spectral optimization of a differentiable FM synthesizer via a standard reconstruction loss. Next, we present Differentiable DX7 (DDX7), a lightweight architecture for neural FM resynthesis of musical instrument sounds in terms of a compact set of parameters. We train the model on instrument samples extracted from the URMP dataset, and quantitatively demonstrate its comparable audio quality against selected benchmarks.
△ Less
Submitted 12 August, 2022;
originally announced August 2022.
-
Tenodesis Grasp Emulator: Kinematic Assessment of Wrist-Driven Orthotic Control
Authors:
Erin Y. Chang,
Raghid Mardini,
Andrew I. W. McPherson,
Yuri Gloumakov,
Hannah S. Stuart
Abstract:
Wrist-driven orthotics have been designed to assist people with C6-7 spinal cord injury, however, the kinematic constraint imposed by such a control strategy can impede mobility and lead to abnormal body motion. This study characterizes body compensation using the novel Tenodesis Grasp Emulator, an adaptor orthotic that allows for the investigation of tenodesis grasping in subjects with unimpaired…
▽ More
Wrist-driven orthotics have been designed to assist people with C6-7 spinal cord injury, however, the kinematic constraint imposed by such a control strategy can impede mobility and lead to abnormal body motion. This study characterizes body compensation using the novel Tenodesis Grasp Emulator, an adaptor orthotic that allows for the investigation of tenodesis grasping in subjects with unimpaired hand function. Subjects perform a series of grasp-and-release tasks in order to compare normal (test control) and constrained wrist-driven modes, showing significant compensation as a result of the constraint. A motor-augmented mode is also compared against traditional wrist-driven operation, to explore the potential role of hybrid human-robot control. We find that both the passive wrist-driven and motor-augmented modes fulfill different roles throughout various tasks tested. Thus, we conclude that a flexible control scheme that can alter intervention based on the task at hand holds the potential to reduce compensation in future work.
△ Less
Submitted 9 November, 2023; v1 submitted 22 November, 2021;
originally announced November 2021.
-
An embedded multichannel sound acquisition system for drone audition
Authors:
Michael Clayton,
Lin Wang,
Andrew McPherson,
Andrea Cavallaro
Abstract:
Microphone array techniques can improve the acoustic sensing performance on drones, compared to the use of a single microphone. However, multichannel sound acquisition systems are not available in current commercial drone platforms. To encourage the research in drone audition, we present an embedded sound acquisition and recording system with eight microphones and a multichannel sound recorder mou…
▽ More
Microphone array techniques can improve the acoustic sensing performance on drones, compared to the use of a single microphone. However, multichannel sound acquisition systems are not available in current commercial drone platforms. To encourage the research in drone audition, we present an embedded sound acquisition and recording system with eight microphones and a multichannel sound recorder mounted on a quadcopter. In addition to recording and storing locally the sound from multiple microphones simultaneously, the embedded system can connect wirelessly to a remote terminal to transfer audio files for further processing. This will be the first stage towards creating a fully embedded solution for drone audition. We present experimental results obtained by state-of-the-art drone audition algorithms applied to the sound recorded by the embedded system.
△ Less
Submitted 17 January, 2021;
originally announced January 2021.
-
Build and Execution Environment (BEE): an Encapsulated Environment Enabling HPC Applications Running Everywhere
Authors:
Jieyang Chen,
Qiang Guan,
Xin Liang,
Louis James Vernon,
Allen McPherson,
Li-Ta Lo,
Zizhong Chen,
James Paul Ahrens
Abstract:
Variations in High Performance Computing (HPC) system software configurations mean that applications are typically configured and built for specific HPC environments. Building applications can require a significant investment of time and effort for application users and requires application users to have additional technical knowledge. Linux container technologies such as Docker and Charliecloud b…
▽ More
Variations in High Performance Computing (HPC) system software configurations mean that applications are typically configured and built for specific HPC environments. Building applications can require a significant investment of time and effort for application users and requires application users to have additional technical knowledge. Linux container technologies such as Docker and Charliecloud bring great benefits to the application development, build and deployment processes. While cloud platforms already widely support containers, HPC systems still have non-uniform support of container technologies. In this work, we propose a unified runtime framework -- Build and Execution Environment (BEE) across both HPC and cloud platforms that allows users to run their containerized HPC applications across all supported platforms without modification. We design four BEE backends for four different classes of HPC or cloud platform so that together they cover the majority of mainstream computing platforms for HPC users. Evaluations show that BEE provides an easy-to-use unified user interface, execution environment, and comparable performance.
△ Less
Submitted 27 February, 2021; v1 submitted 19 December, 2017;
originally announced December 2017.
-
Joint Inference of Genome Structure and Content in Heterogeneous Tumour Samples
Authors:
Andrew McPherson,
Andrew Roth,
Gavin Ha,
Sohrab P. Shah,
Cedric Chauve,
S. Cenk Sahinalp
Abstract:
For a genomically unstable cancer, a single tumour biopsy will often contain a mixture of competing tumour clones. These tumour clones frequently differ with respect to their genomic content (copy number of each gene) and structure (order of genes on each chromosome). Modern bulk genome sequencing mixes the signals of tumour clones and contaminating normal cells, complicating inference of genomic…
▽ More
For a genomically unstable cancer, a single tumour biopsy will often contain a mixture of competing tumour clones. These tumour clones frequently differ with respect to their genomic content (copy number of each gene) and structure (order of genes on each chromosome). Modern bulk genome sequencing mixes the signals of tumour clones and contaminating normal cells, complicating inference of genomic content and structure. We propose a method to unmix tumour and contaminating normal signals and jointly predict genomic structure and content of each tumour clone. We use genome graphs to represent tumour clones, and model the likelihood of the observed reads given clones and mixing proportions. Our use of haplotype blocks allows us to accurately measure allele specific read counts, and infer allele specific copy number for each clone. The proposed method is a heuristic local search based on applying incremental, locally optimal modifications of the genome graphs. Using simulated data, we show that our method predicts copy counts and gene adjacencies with reasonable accuracy.
△ Less
Submitted 24 April, 2015; v1 submitted 13 April, 2015;
originally announced April 2015.