Opus is generally considered the best low-bitrate codec available, and doesn't have problems with an 8kHz input sample rate. The resulting opus stream can still be decoded to whatever sample rate is convenient for the decoder. (Like other lossy codecs, it compresses based on frequency bands after doing an FFT. But some other codecs apparently only want to decode to the same sample rate as the input. As other answers point out, you can get FFmpeg to resample the input before giving it to the codec, but you don't need that for Opus.
Try ffmpeg -c:a libopus -b:a 24k -frame_duration 120
for 24 kbit/s Opus.
Perhaps worth trying: -application voip
to tune for "improved speech intelligibility" instead of the default audio
profile.
Setting -frame_duration
to the highest value reduces overhead, I think. You don't care about encoder / decoder latency because you just have files, not real-time 2-way voice chat. So you can let it buffer 120ms of audio and pack together multiple CELT or SILK frames to reduce redundancy of frame headers.
The best available Opus encoder is the free and open source libopus
(https://opus-codec.org) so FFmpeg can just use it, unlike with AAC where the best encoders are closed-source.
Opus has special modes for very low bitrate speech (like 16kb/s), detecting speech and even switching over to a speech-specific encoder (SILK) at low bitrates.
Opus's low-bitrate coding tools are similar to what HE-AACv2 can do, see the wikipedia article.
But when I tried it, compared to the original, the file size increased ...
Part of the point of lossy compression is that you can choose the output bitrate, trading off against quality. Most codecs can use -b:a 32k
for example to choose an audio bitrate of 32 kbit/s.
(For video, you can also trade off CPU time spent encoding, e.g. -preset veryslow vs. -preset medium. But compressing audio is cheap enough that most codecs don't have a lot of options for spending more CPU time to improve the bitrate vs. quality tradeoff.)
Mono 8-bit 8kHz PCM has a bitrate of 64 kbit/s = 8 * 8000 so you're aiming for lower than that, otherwise you might as well keep your original files. PCM is just raw samples so bitrate is just a product of sample rate and sample width. Like the audio equivalent of a .bmp
bitmap image. That's highly inefficient, and the reason better codecs were invented. (And as you know from listening, saving bitrate for PCM comes at a massive cost to quality and frequency range because bitrate is tied 1:1 with sample rate.
That's not the case when you quantize in the frequency domain with a lossy codec.)
and some high frequencies were attenuated. So, worse than -c:a copy
FFmpeg's native AAC encoder -c:a aac
used to be pretty bad, and you were using an old FFmpeg. https://trac.ffmpeg.org/wiki/Encode/HighQualityAudio says that as of 2017, aac
is sometimes better than libfdk_aac
for AAC-LC (low-complexity high bitrate). It doesn't mention HE-AAC, though, and that's what you want for low bitrate AAC.
libfdk_aac
used to be the best open-source AAC encoder available, and maybe still is for HE-AAC. AFAIK, neither of them are as good as the best non-free AAC encoders, though.
For low-bitrate AAC, you really want HE-AAC which adds more coding tools https://en.wikipedia.org/wiki/High-Efficiency_Advanced_Audio_Coding. I'm not sure if -c:a aac
can do that.
https://trac.ffmpeg.org/wiki/Encode/HighQualityAudio lists some recommended settings and ranges of useful bitrates for various encoders.
But you probably want Opus, or possibly AMR-NB (narrowband) for bitrates like 4 kbit/s. I don't know how old the quality vs. bitrate plot on the Opus wiki article is, but it shows AMR-NB at higher quality than Opus down below 8kb/s.
With that few bits, you might be able to understand speech but it won't sound nice. It's just a question of which codec is least horrible.