The LAME Technical FAQ has some relevant information. Here are some excerpts:
Why does LAME add silence to the beginning each song?
This is because of several factors:
Decoder delay at start of file:
All decoders I have tested introduce a delay of 528 samples. That is,
after decoding an mp3 file, the output will have 528 samples of 0's
appended to the front. This is because the standard MDCT/filterbank
routines used by the ISO have a 528 sample delay.
Furthermore, because of the overlapped nature of MDCT frames, the
first half of the first granule (1 granule=576 samples) doesn't have a
previous frame to overlap with, resulting in attenuation of the first
N samples.
Encoder delay at start of file:
ISO based encoders (BladeEnc, 8hz-mp3, etc) use a MDCT/filterbank
routine similar to the one used in decoding, and thus also introduce
their own 528 sample delay. A .wav file encoded & decoded will have a
1056 sample delay (1056 samples will be appended to the beginning).
Starting with LAME 3.55, we have a new MDCT/filterbank routine written
by Takehiro Tominaga with a 48 sample delay.
Refer to the LAME Technical FAQ for additional related answers and more in-depth information.
Some options that aren't great but might do the job:
Do you have to use MP3? Can you re-output your files to a different format such as PCM ("wav")? Note that simply re-encoding the existing MP3 files to another format will preserve the delay.
You can use the afade
audio filter to add a fade out/fade in per section, or the atrim
audio filter to possibly make the gaps less abrupt. However, filtering requires re-encoding.