I am using ffmpeg to extract 3 frames per second using this command

ffmpeg -i input.flv -f image2 -vf fps=fps=3 out%d.png

I am wondering if I set the fps value, then how does ffmpeg select 3 frames in a second. Is it random or does it take the first 3 frames in that second? Any help?

  • Without too much knowledge of how it works, it would likely space them out evenly, so if you have 30 frames, and you want 3, it would take 1,15, and 30. Taking the first 3 doesn't make sense in terms of creating a video.
  • Yeah. That's what i thought. But since I am going to use these frames for my work, I need to be sure if it evenly extracts the frame in a second or what is the exact criteria.
  • Why not just turn all the frames into images, you can pick the specific ones, and then turn the images back into a video. Here's how
  • 1
    Experiment, generate some pngs from both fps and see where they are aligned. Or dig into the code to find out specifically how these decisions are made. As already stated, there is a programmatic way the pngs are chosen so it's not going to be random.
It does this by rescaling timestamp values from the input timebase (i.e. FPS as a fraction, e.g. 24fps would become 1/24) to the output timebase.

First the timebase is set based on the requested FPS:

link->time_base = av_inv_q(s->framerate);

When filtering, the number of output frames is calculated based on the number of input frames in the buffer, scaling that number between the two time bases, so basically frames × input / output. Note that buf->pts - s->first_pts apparently is a number of frames, not an actual difference in PTS time.

/* number of output frames */
delta = av_rescale_q_rnd(buf->pts - s->first_pts, inlink->time_base,
                         outlink->time_base, s->rounding) - s->frames_out ;

So, for example, input timebase being 0.042 (24 fps), the output 0.33 (3 fps), and you have 12 frames of input in the buffer, you will get 12 × 0.042 / 0.33 frames, which is rounded to the next nearest integer 2 — so two frames are to be generated. If you have 24 frames, you get, of course, three frames. For 35 frames in the input buffer, you get four output frames.

If that delta is smaller than 1, the frames in the buffer can be dropped because no frame is needed in this time range. If on the other hand the delta is larger than one, it is the number of frames that need to be output for the input buffer.

For new frames, the PTS value is scaled based on the input and output time bases:

buf_out->pts = av_rescale_q(s->first_pts, inlink->time_base,
                                outlink->time_base) + s->frames_out;

In practice, this means you'll have to look at the PTS of your input video, do the calculations of how many frames per second your output can have, then spread those out equally by dropping frames as needed. If you want to be super precise I'd recommend debugging the source code with a few test videos you have.

I'm afraid I can't come up with a more practical solution than the answer I posted here recently, in which I explain how to show the PTS of each frame in a video whose framerate was changed:

ffmpeg -i input.mp4 -t 10 -filter:v "fps=fps=25, showinfo" -f null - 2>&1 grep pts_time | awk '{print $6}' | cut -d: -f2

These timestamps belong to every output frame, and its corresponding input PTS time.

