3

Trying to grab correctly video and audio data from an IP camera Hikvision.

Everything works like a charm when doing so for H264 + MP2 for example.

When trying to grab RAW audio in PCM s16le - smile goes off of my face.

Here is how I grab my camera (you can try it is opened to the world):

ffmpeg -re -acodec pcm_s16le -ac 1 -rtsp_transport tcp -i rtsp://superuser:[email protected]:10554 -vcodec copy -acodec libfdk_aac -vbr 5 test.ts

The command works and packs RTSP stream to a TS file.

However the duration of audio and video is different. For an example, I am recording 21 sec, from that I have 21 sec of Audio and 15 of Video.

The audio is being stretched and pitch is lowered. Have spent several days reading FFmpeg documentation and applied various options like async, changing sample rate and so on - no luck.

I hope Mulvya or other FFmpeg experts will advice me a FIX to get things done correctly.

C:\Users\User>d:/ffmpeg/bin/ffmpeg -y -re -acodec pcm_s16le -rtsp_transport 
tcp -i rtsp://superuser:[email protected]:10554 -vcodec copy -
acodec aac -b:a 96k d:/ffmpeg/hik_aac.ts
ffmpeg version N-83410-gb1e2192 Copyright (c) 2000-2017 the FFmpeg 
developers
built with gcc 5.4.0 (GCC)
configuration: --enable-gpl --enable-version3 --enable-cuda --enable-cuvid -
-enable-d3d11va --enable-dxva2 --enable-libmfx --enable-nvenc --enable-
avisynth --enable-bzlib --enable-fontconfig --enable-frei0r --enable-gnutls 
--enable-iconv --enable-libass --enable-libbluray --enable-libbs2b --enable-
libcaca --enable-libfreetype --enable-libgme --enable-libgsm --enable-
libilbc --enable-libmodplug --enable-libmp3lame --enable-libopencore-amrnb -
-enable-libopencore-amrwb --enable-libopenh264 --enable-libopenjpeg --
enable-libopus --enable-librtmp --enable-libsnappy --enable-libsoxr --
enable-libspeex --enable-libtheora --enable-libtwolame --enable-libvidstab -
-enable-libvo-amrwbenc --enable-libvorbis --enable-libvpx --enable-
libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-
libxavs --enable-libxvid --enable-libzimg --enable-lzma --enable-decklink --
enable-zlib
libavutil      55. 46.100 / 55. 46.100
libavcodec     57. 75.100 / 57. 75.100
libavformat    57. 66.101 / 57. 66.101
libavdevice    57.  2.100 / 57.  2.100
libavfilter     6. 72.100 /  6. 72.100
libswscale      4.  3.101 /  4.  3.101
libswresample   2.  4.100 /  2.  4.100
libpostproc    54.  2.100 / 54.  2.100
Guessed Channel Layout for Input Stream #0.1 : mono
Input #0, rtsp, from 'rtsp://superuser:[email protected]:10554':
Metadata:
title           : Media Presentation
Duration: N/A, start: 0.000000, bitrate: N/A
Stream #0:0: Video: h264 (Main), yuv420p(progressive), 1920x1080, 16 fps, 25 
tbr, 90k tbn, 32.01 tbc
Stream #0:1: Audio: pcm_s16le, 16000 Hz, mono, s16, 256 kb/s
Output #0, mpegts, to 'd:/ffmpeg/hik_aac.ts':
Metadata:
title           : Media Presentation
encoder         : Lavf57.66.101
Stream #0:0: Video: h264 (Main), yuv420p(progressive), 1920x1080, q=2-31, 16 
fps, 25 tbr, 90k tbn, 90k tbc
Stream #0:1: Audio: aac (LC), 16000 Hz, mono, fltp, 96 kb/s
Metadata:
  encoder         : Lavc57.75.100 aac
Stream mapping:
Stream #0:0 -> #0:0 (copy)
Stream #0:1 -> #0:1 (pcm_s16le (native) -> aac (native))
Press [q] to stop, [?] for help
[mpegts @ 00000000032cf020] Non-monotonous DTS in output stream 0:0; 
previous: 33976, current: 7200; changing to 33977. This may result in 
incorrect timestamps in the output file.
[mpegts @ 00000000032cf020] Non-monotonous DTS in output stream 0:0; 
previous: 33977, current: 14400; changing to 33978. This may result in 
incorrect timestamps in the output file.
[mpegts @ 00000000032cf020] Non-monotonous DTS in output stream 0:0; 
previous: 33978, current: 18000; changing to 33979. This may result in 
incorrect timestamps in the output file.
[mpegts @ 00000000032cf020] Non-monotonous DTS in output stream 0:0; 
previous: 33979, current: 25200; changing to 33980. This may result in 
incorrect timestamps in the output file.
[mpegts @ 00000000032cf020] Non-monotonous DTS in output stream 0:0; 
previous: 33980, current: 28800; changing to 33981. This may result in 
incorrect timestamps in the output file.
frame=   85 fps= 11 q=-1.0 Lsize=    1357kB time=00:00:07.42 
bitrate=1497.1kbits/s speed=0.997x
video:1196kB audio:51kB subtitle:0kB other streams:0kB global headers:0kB 
muxing overhead: 8.805858%
aac @ 00000000030a0a00] Qavg: 63342.980
Exiting normally, received signal 2.
5
  • 1
    Need to see full log.
    – Gyan
    Commented Aug 21, 2017 at 16:43
  • Thank you for your attention. You are free to try if needed (open to the world): ffmpeg -y -re -acodec pcm_s16le -rtsp_transport tcp -i rtsp://superuser:[email protected]:10554 -vcodec copy -acodec aac -b:a 96k
    – Max Ridman
    Commented Aug 21, 2017 at 18:43
  • As far as I can tell the audio stream from the camera itself is already like that (if you disable video with -vn, it's too slow). Could it be that it uses the wrong indicated sample rate? Can you change the parameters of that webcam's encoding?
    – slhck
    Commented Aug 21, 2017 at 20:10
  • I have tried that some days ago - same result. But when playing that in Web interface of the IP camera - it works OK. Ffmpeg has options to stretch /squeeze audio stream based on video timestamps, but that seems not to work.
    – Max Ridman
    Commented Aug 22, 2017 at 5:15
  • And the bad thing - I can NOT indicate the sample rate on input RAW audio. That would be useful as far as some devices have wrong header (aka header says the sample rate is 16kHz but in real it is 22.05khz) and you can't do anything about that.
    – Max Ridman
    Commented Aug 22, 2017 at 5:17

1 Answer 1

2

As per the comments, since the actual sampling rate appears to be 22.05 kHz, we can conform the audio to that rate.

Use

ffmpeg -y -re -acodec pcm_s16le -rtsp_transport tcp -i rtsp://URL
       -vcodec copy -af asetrate=22050 -acodec aac -b:a 96k test.mp4

The asetrate does not resample the audio, it simply resets the sample rate context.

2
  • Thank you Mulvya for this great suggestion! It has resolved the issue, but I have small drift between audio and video ~300 ms, can I sync them based on video stream?
    – Max Ridman
    Commented Aug 22, 2017 at 7:49
  • First try: add -vsync 0 to the command. If that doesn't fix it, after you save the capture, run a 2nd command: ffmpeg -i test.mp4 -itsoffset -0.300 -i test.mp4 -c copy -map 0:v -map 1:a test2.mp4. This shifts the captured audio 300 ms earlier. Adjust value as needed.
    – Gyan
    Commented Aug 22, 2017 at 8:13

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .