0

I would like to produce a numeric list of amplitudes from an audio file. I should be able to:

  • Specify the sampling rate (16kHz, 44.1kHz, etc)
  • Specify the data type of the amplitude samples (8 bit integers, 32 bit floats, etc)
  • Easily parse the list so that I can import it into other tools, like Python's numpy (newline delimited, csv, etc)
  • Conversely, I would also like a method to re-encode such a list into an arbitrary audio format.

I believe I have used ffmpeg to do this before, but haven't been able to find a solution. (Or maybe it was Audacity?)

I think I'm hot on the trail when I look at the set of codecs that my recent-ish ffmpeg supports (edited excerpt from ffmpeg -codecs):

 DEA..S pcm_f64be            PCM 64-bit floating point big-endian
 DEA..S pcm_s24be            PCM signed 24-bit big-endian
 DEA..S pcm_s64be            PCM signed 64-bit big-endian
 DEA..S pcm_s8               PCM signed 8-bit
 DEA..S pcm_u32be            PCM unsigned 32-bit big-endian
 DEA..S pcm_u8               PCM unsigned 8-bit

The above "PCM" method seems to describe exactly what I'm trying to do, but I just need to know how to extract the samples in a parseable format.

All the commands that I've tried create files in some binary encoding that seem to require some kind of decoder to understand. Here's an example:

ffmpeg -i audio.wav -f u8 -c:a pcm_u8 -ar 16000 out.raw

ffmpeg completes this command without issue, but the output is indecipherable.

1 Answer 1

2

All formats require some kind of a parser/decoder, however, the parser needed for the PCM format in your example is actually even simpler than that needed for CSV. -f u8 is a very straightforward format – "PCM" does not involve any compression, in your case it is literally 1 byte per sample.

This means that various built-in Python and/or Numpy functions can be used to read it. With u8 (1 byte per sample), you don't need anything extra as Python will already give you a bytearray consisting of unsigned integer values:

Note: All examples are for Python 3.

with open("out_pcm_u8.raw", "rb") as fh:
    samples = list(fh.read())

With formats like u32be, you can use the 'struct' or 'array' modules, as well as numpy.frombuffer(). All necessary information is already in the format's name and you just use help(struct) to find the matching type (> for big-endian, I for u32, i for s32). For example:

import struct

with open("out_pcm_u32be.raw", "rb") as fh:
    buf = fh.read()
    samples = [t[0] for t in struct.iter_unpack(">I", buf)]
import numpy

dt = numpy.dtype(">u4")
with open("out_pcm_u32be.raw", "rb") as fh:
    buf = fh.read()
    samples = numpy.frombuffer(buf, dtype=dt)

For completeness, the expanded version of the earlier struct example:

import struct

samples = []
with open("out_pcm_u32be.raw", "rb") as fh:
    while True:
        buf = fh.read(32 // 8)
        if buf:
            (samp,) = struct.unpack(">I", buf)
            samples.append(samp)
        else:
            break
6
  • 1
    @t-mart: It just occured to me that Numpy has its own binary data loader, numpy.frombuffer(), which could be used for the same purpose. Commented Sep 12, 2019 at 9:02
  • As an improvement, from things I just read in the struct package: 1. The returned values from struct functions are "a tuple even if it contains exactly one item", so list.extend() might be better than append to avoid nested data 2. "Creating a Struct object once and calling its methods is more efficient than calling the struct functions with the same format since the format string only needs to be compiled once." Since we're repeating this call for a (large?) file, seems like a good optimization. 3. Also,Struct objects expose a size property, which is simpler than 32 // 8
    – t-mart
    Commented Sep 12, 2019 at 9:10
  • 1
    Yes, the struct-using code could be greatly improved for correctness and efficiency; it was mostly meant to demonstrate that it's still a very simple parser (literally a list of samples). The best option seems to be either array.frombuffer() which I forgot or numpy.frombuffer() which I didn't know previously, either of which should completely avoid creating objects for every sample. Commented Sep 12, 2019 at 9:15
  • array.frombuffer() unfortunately only uses the native endianness of the current machine. I suppose you could fix this at ffmpeg-time to match your own. stackoverflow.com/a/23320951/235992 Numpy types (docs.scipy.org/doc/numpy/user/basics.types.html) also seem to lack specification of endianness, however you could do something like docs.scipy.org/doc/numpy/user/…
    – t-mart
    Commented Sep 12, 2019 at 9:29
  • Yes, but it still allows you to do if sys.byteorder != "big": arr.byteswap(), which is a bit more annoying than having the module take care of it, but ultimately performs the same thing. (Hopefully.) Commented Sep 12, 2019 at 9:40

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .