All formats require some kind of a parser/decoder, however, the parser needed for the PCM format in your example is actually even simpler than that needed for CSV. -f u8
is a very straightforward format – "PCM" does not involve any compression, in your case it is literally 1 byte per sample.
This means that various built-in Python and/or Numpy functions can be used to read it. With u8
(1 byte per sample), you don't need anything extra as Python will already give you a bytearray
consisting of unsigned integer values:
Note: All examples are for Python 3.
with open("out_pcm_u8.raw", "rb") as fh:
samples = list(fh.read())
With formats like u32be
, you can use the 'struct' or 'array' modules, as well as numpy.frombuffer(). All necessary information is already in the format's name and you just use help(struct)
to find the matching type (>
for big-endian, I
for u32, i
for s32). For example:
import struct
with open("out_pcm_u32be.raw", "rb") as fh:
buf = fh.read()
samples = list([t[0] for t in struct.iter_unpack(">I", buf))]
The same using numpy:
import numpy
dt = numpy.dtype(">u4")
with open("out_pcm_u32be.raw", "rb") as fh:
buf = fh.read()
samples = numpy.frombuffer(buf, dtype=dt)
For completeness, the expanded version of the earlier struct
example:
import struct
samples = []
with open("out_pcm_u32be.raw", "rb") as fh:
while True:
buf = fh.read(32 // 8)
if buf:
(samp,) = struct.unpack(">I", buf)
samples.append(samp)
else:
break