Depending on the video they can have a main frame that is essentially a compressed image in a lossy format similar to a JPEG. This is an i-frame
that contains actual image data.
JPEG already achieves somewhere in the order of 10:1 compression with minimal loss of quality, newer video codecs are probably at least as good.
If the next few frames contain a lot of very similar data but a small amount of movement you can simply have a bit of data that says "move these areas of the image by x pixels" and then only compress actual new image data that wasn't already on screen. This is a p-frame
, or a predicted frame based on a previous one.
If you have a series of successive p-frames
after an i-frame
you could essentially wipe out an entire series of frames. You would be reducing 8 full frames of data down to 1 single compressed frame along with a small amount of data giving transforms and mathematical calculations.
At 25 frames per second if you have an i-frame every eighth frame you could reduce the amount of data by a very significant quantity. Potentially it could reduce the data to somewhere around 1/8th plus some new parts of the image.
Then there are b-frames
, bi-directional predicted frames. These can look back at previous i-frames
, but they can also look forward to the next i-frame
. If you know that there is new data in the next full image then you can use that data to encode even less data in the current b-frame
and rely on an i-frame
even further ahead.
These predicted frames can massively reduce the amount of actual encoded data, but at the cost of increased processing power required to encode and decode the video. You need a lot of processing power to look back and forwards across the series of images and figure out all the similarities and differences and apply transforms, blends, blurs, motions and so on.
It also costs the decoder because you need to buffer at least two i-frames
in order to re-create all the frames in-between.
By offloading a lot of data into equations detailing transforms and motions you can reduce the bitrate far, far below what you would expect, and achieve compression ratios far higher, especially for video without much in the way of motion or changes between frames.
The video you linked shows some artefacts of highly compressed i-frames
and has very little in the way of motion. It could easily get very high compression ratios.
You can get more information on the basics of the process at Wikipedia: Video compression picture types
mediainfo
calculates it as0.133 Bits/(Pixel*Frame)
. (mediaarea.net/en/MediaInfo - open source free software). YUV420 is standard for video outside of high-bitrate formats used while recording and editing, or cinema. (i.e. any.mp4
you find on the internet except from video nerds demoing different formats; e.g. ffmpeg with x264 can encode and decode h.264 with High 4:2:2 or High 4:4:4 profile en.wikipedia.org/wiki/Advanced_Video_Coding#Profiles)