I am trying to extract audio snippets using command line tools. I get consistent, unexpected results and I believe this is due to how the audio files were created/encoded.
Note: I realise there are other approaches to share the content, I'm doing it this way to share the content with users who are either not very computer literate or geo-blocked from the raw content.
Problem Description / Reproduction steps:
I start off by using yt-dlp to download a podcast, such as this one with this command:
yt-dlp -x --audio-format mp3 -o GQT_2012-10-14.mp3 https://www.bbc.co.uk/programmes/b01n6vnh
The file is downloaded and plays correctly. I would like to extract a snippet that starts at 20:48 and lasts 03:58, so it finishes at 24:46
I tried this first using FFmpeg (version 4.2.7-0ubuntu0.1 on Ubuntu 20.04), with this command:
ffmpeg -i "/home/user/GQT_2012-10-14.mp3" -ss 00:20:48 -t 00:03:58 GQT_2012-10-12_Snippet1.mp3
This generates a file that is 3 minutes 58 seconds long but the start time corresponds to 20:28 in the original file.Then I tried using Mp3Splt (version 2.6.2 on the same OS. I am aware that this is an old version), with this command:
mp3splt "/home/user/GQT_2012-10-14.mp3" -o GQT_2012-10-12_Snippet1 20.48.00 24.46.00
This generates the same output, a file that is the correct length but 20 seconds early in terms of the expected start time.
Given the same results from both command line tools, this suggests the issue lies with the input file. I tried to inspect it using ffprobe
. Within the output, I saw this:
Duration: 00:43:00.09, start: 0.025057, bitrate: 141 kb/s
I interpret this as the file is "tagged" as starting 25 milliseconds in. Certainly not 20 seconds.
I tried to reset this to zero anyway, trying variations of this answer, I wasn't successful.
I'm looking the understand the root cause of the error in the extracted snippets and correct it.