1

I would like to clarify that I understand how to use -ss, -t, -to, stream copy, and the difference between stream copying and reencoding/transcoding.

What I don't understand, is how the seeking/cutting/splitting (these words can be used interchangeably, right ?) works regarding keyframes.

The FFmpeg doc says that, for the -ss option :

When used as an input option (before -i), seeks in this input file to position. Note that in most formats it is not possible to seek exactly, so ffmpeg will seek to the closest seek point before position. When transcoding and -accurate_seek is enabled (the default), this extra segment between the seek point and position will be decoded and discarded. When doing stream copy or when -noaccurate_seek is used, it will be preserved.

When used as an output option (before an output url), decodes but discards input until the timestamps reach position.

The FFmpeg Wiki says that :

Input seeking

Specify -ss before -i:

ffmpeg -ss 00:23:00 -i "Mononoke.Hime.mkv" -frames:v 1 "out1.jpg"

The demo produces 1 image frame at 23 min from the beginning of the movie. The input will be parsed by keyframe, which is very fast.

As of FFmpeg 2.1, when transcoding with ffmpeg (i.e. not stream copying): -ss is also "frame-accurate" even as input option. Previous behavior (seek only to nearest preceding keyframe, despite inaccuracy) can be restored with -noaccurate_seek.

Output seeking

Specify -ss after -i:

ffmpeg -i "Mononoke.Hime.mkv" -ss 23:00 -frames:v 1 "out2.jpg"

The demo also produces 1 image frame precisely at 23 min from the beginning of the movie.

Here, the input is decoded (and discarded) until it reaches the position indicated by -ss. This will be done relatively slow, frame-by-frame.

Seeking while codec copy

Using -ss with -c copy alike may not be accurate: since ffmpeg may only split on I-frame (keyframe independently decodable) alike. Though it may, if applicable: auto-adjust the stream's start time to negative to compensate.

E.g. (with typical video) requested timestamp 157 s; but no keyframe until 159 s: It shall include ~ 2 s audio (no video) at the start, and start from the 1st keyframe.

Since it is not clear which parts of the text are outdated, and the writing style is sometimes quite bad, I am not sure how reliable this wiki page is.

I did some tests, and saw that when I use stream copy :

  • the duration of the resulting video is the exact same as when I don't use stream copy
  • the end of the resulting video is the exact same as when I don't use stream copy
  • the start of the resulting video is, audio-wise, the exact same as when I don't use stream copy ; however, video-wise :
    • if I used -ss as an input option : the first seconds are pixelated
    • if I used -ss as an output option : the first seconds are a still image

The commands I used :

ffmpeg -ss POSITION -t DURATION -i INPUT  -map 0 [+ -c copy] OUTPUT
ffmpeg -ss POSITION -i INPUT -t DURATION  -map 0 [+ -c copy] OUTPUT
ffmpeg -i INPUT -ss POSITION -t DURATION  -map 0 [+ -c copy] OUTPUT
ffmpeg -ss POSITION -to POSITION -i INPUT -map 0 [+ -c copy] OUTPUT
ffmpeg -ss POSITION -i INPUT -to POSITION -map 0 [+ -c copy] OUTPUT
ffmpeg -i INPUT -ss POSITION -to POSITION -map 0 [+ -c copy] OUTPUT

Here are my questions :

  1. The doc says : "in most formats it is not possible to seek exactly, so ffmpeg will seek to the closest seek point before position". Here, does "seek point" means "keyframe" (or "keyframe"-like) ? or something else ?

  2. When -ss is used as an input option, the input is parsed keyframe by keyframe ; while when -ss is used as an output option, the input is decoded (whether we do stream copy or not) and parsed frame by frame. Is that right ?

  3. The doc says : "ffmpeg will seek to the closest seek point before position. When transcoding and -accurate_seek is enabled (the default), this extra segment between the seek point and position will be decoded and discarded. When doing stream copy or when -noaccurate_seek is used, it will be preserved.". However, the wiki says, about using -ss with stream copy : "E.g. (with typical video) requested timestamp 157 s; but no keyframe until 159 s: It shall include ~ 2 s audio (no video) at the start, and start from the 1st keyframe.". Doesn't this contradict the doc about the preservation of the extra segment between the seek point and position ? and also, about the claim that ffmpeg seeks to the closest seek point BEFORE position ?
    Same thing with my tests, I saw that doing stream copy doesn't change the starting point nor the duration of the resulting video, but it makes the beginning pixelated or still. Doesn't this also contradict the doc ?
    I would have thought that if ffmpeg was seeking to the closest seek point before position, and the extra segment between the seek point and position was preserved when doing stream copy, then the video I obtained with stream copy should start with a keyframe, and shouldn't be pixelated or still at its start (and shouldn't start at the exact same timestamp as when I don't use stream copy).
    From what I understand, the pixelated/still beginning when using stream copy comes from the fact that I chose a starting point (for -ss) that was not a keyframe, which means the first seconds can't be decoded properly ; but ffmpeg didn't seem to preserve anything compared to not using stream copy.

  4. The wiki says : "Using -ss with -c copy alike may not be accurate: since ffmpeg may only split on I-frame (keyframe independently decodable) alike". However, as in my question above, from my tests and from my understanding, ffmpeg can accurately split/seek/cut anywhere : it's just that, with stream copy, the start of the video will generally be pixelated or still. Is that right ?

  5. With stream copy, regarding the first seconds of the resulting video, why are they a still image when I use -ss as an output option, instead of being pixelated like when I use -ss as an input option ?

  6. The doc says : "ffmpeg will seek to the closest seek point before position. When transcoding and -accurate_seek is enabled (the default), this extra segment between the seek point and position will be decoded and discarded. When doing stream copy or when -noaccurate_seek is used, it will be preserved.". However, I have read elsewhere that "-noaccurate_seek option uses nearest keyframe and is about a brazillian times faster". How is that option this faster, when the doc suggests that all it does is preserving the extra segment between the seek point and position ?

  7. I also did some tests with -noaccurate_seek. Compared to a basic command without stream copy, if I add -noaccurate_seek, the start of the resulting video is the exact same, however, the end of the video is truncated by a few seconds. Why is that ? If there was a difference, shoudn't it be at the start of the video ? I never had such truncation at the end when I tested with stream copy alone (without -noaccurate_seek).

New contributor
Blacki is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct.

1 Answer 1

0

does "seek point" means "keyframe"

Generally. Some formats (like MP4) contain a list of seek points for each stream. Others dont, but ffmpeg will try to parse packet headers and assemble an index of keyframes. Others don't have any usable metadata, so ffmpeg will fail to seek.

when -ss is used as an output option, the input is decoded (whether we do stream copy or not) and parsed frame by frame. Is that right ?

No. Output -ss X is a gate function i.e. allow only elements starting with timestamp X. Whether elements are decoded frames or original packets depend on the codec option set for that stream.

the claim that ffmpeg seeks to the closest seek point BEFORE position ?

This depends on the seek capabilities of the input format. For MP4, ffmpeg will supply packets from the keyframe before 157s. For a format like MPEG-TS (with no retrograde seek), video will start from 159s.

With stream copy, regarding the first seconds of the resulting video, why are they a still image when I use -ss as an output option

As a consequence of answer to #2, the video stream starts from 159s, but the audio will start from 157s, so most players will show the first available frame till video playback is in sync with audio.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .