I would like to produce a numeric list of amplitudes from an audio file. I should be able to:
- Specify the sampling rate (16kHz, 44.1kHz, etc)
- Specify the data type of the amplitude samples (8 bit integers, 32 bit floats, etc)
- Easily parse the list so that I can import it into other tools, like Python's numpy (newline delimited, csv, etc)
- Conversely, I would also like a method to re-encode such a list into an arbitrary audio format.
I believe I have used ffmpeg to do this before, but haven't been able to find a solution. (Or maybe it was Audacity?)
I think I'm hot on the trail when I look at the set of codecs that my recent-ish ffmpeg supports (edited excerpt from ffmpeg -codecs
):
DEA..S pcm_f64be PCM 64-bit floating point big-endian
DEA..S pcm_s24be PCM signed 24-bit big-endian
DEA..S pcm_s64be PCM signed 64-bit big-endian
DEA..S pcm_s8 PCM signed 8-bit
DEA..S pcm_u32be PCM unsigned 32-bit big-endian
DEA..S pcm_u8 PCM unsigned 8-bit
The above "PCM" method seems to describe exactly what I'm trying to do, but I just need to know how to extract the samples in a parseable format.
All the commands that I've tried create files in some binary encoding that seem to require some kind of decoder to understand. Here's an example:
ffmpeg -i audio.wav -f u8 -c:a pcm_u8 -ar 16000 out.raw
ffmpeg completes this command without issue, but the output is indecipherable.