30

I'm dealing with a large archive of satellite images of the Earth, each one taken 15 minutes apart over the same area, therefore they are quite similar to each other. Two contiguous ones look like this: enter image description here

Video algorithms do very well compressing multiple similar images. However, this images are too large for video (10848x10848) and using video encoders would delete the metadata of the images, so extracting them and restoring the metadata would be cumbersome even if I get a video encoder to work with such large images.

To make some tests I've reduced the 96 images of one day to 1080x1080 pixels, totaling 40.1MB and try different compression with the folowing results:

  1. zip: 39.8 MB
  2. rar: 39.8 MB
  3. 7z : 39.6 MB
  4. tar.bz2: 39.7 MB
  5. zpaq v7.14: 38.3 MB
  6. fp8 v2: 32.5 MB
  7. paq8pxd v45: 30.9 MB

The last three, are supposed to take much better advantage of the context and indeed work better than traditional compression, but the compression ratio is still pretty poor compared with mp4 video that can take it to 15 MB or even less preserving the image quality.

However, none of the algorithms used by those compression utilities seem to take advantage of the similarity of the images as video compression do. In fact, using packJPG, that compress each image separately, the whole set get down to 32.9 MB, quite close to fp8 and paq8pxd but without taking at all advantage of the similarities between images (because each image is compressed individually).

In another experiment, I calculated in Matlab the difference of the two images above, and it looks like this:

enter image description here

Compressing both original images (219.5 + 217.0 = 436.5 kB total) with fp8 get them down to 350.0 kB (80%), but compressing one of them and the difference image (as a jpg of the same quality and using 122.5 kB), result in a file of 270.8 kB (62%), so again (as revealed by the mp4 and packJPG comparison), fp8 doesn't seem to take much advantage of the similarities. Even compressed with rar, one image plus the difference do better than fp8 on the original images. In that case, rar get it down to 333.6 kB (76%).

I guess there must be a good compression solution for this problem, as I can envision many applications. Beside my particular case, I guess many professional photographers have many similar shots due to sequential shooting, or time-lapse images, etc. All cases that would benefit from such compression.

Also, I don't require loseless compression, at least not for the image data (metadata must be preserved).

So... Is there a compression method that do exploit the similarities between the images been compressed?

The two images of the above test can be downloaded here, and the 96 images of the first test here.

8
  • 3
    More feedback from the people who put the question on hold would be appreciated. I feel the question is general enough and can be answered without pointing to a specific product, but to a method, algorithm or technique. Commented Apr 9, 2018 at 17:09
  • 1
    Peanut gallery (I didn't vote to close) but Is there a compression utility that take advantage of the similarities between images better than zpaq and fp8? and Is there a updated/maintained version of the fp8 utility? are likely the offending lines. Contrast that with e.g. Is there a compression *method, algorithm or technique* that take advantage of the similarities between images better than zpaq and fp8? The focus is arguably much different. Asking for software is probably redundant anyway, since specific software (if applicable) will almost certainly be mentioned in any answer given. Commented Apr 9, 2018 at 22:45
  • 1
    I agree. And done. Good luck. =) Commented Apr 10, 2018 at 0:35
  • 2
    "Too big for video"? Not sure I agree with this. Some codecs have very high or unlimited max resolutions. You're not trying to build a watchable video, just compress some static images. Could you encode the metadata as subtitles or other data? Commented Apr 11, 2018 at 10:07
  • 1
    To add to the list of applications, I would need this to store original frames of a time lapse project that will get additional parts in the future. The current 10 000 x 4K JPG images take 25 GB of space, where a MP4 composed of them takes only 85 MB. Commented Oct 15, 2019 at 12:39

2 Answers 2

8

I don't know of a specific software that does this, but there is some research on the subject. For example, see the articles Compressing Sets of Similar Images by Samy Ait-Aoudia, Abdelhalim Gabis, Amina Naimi, and Compressing sets of similar images using hybrid compression model by Jiann-Der Lee, Shu-Yen Wan, Chemg-Min Ma, Rui-Feng Wu.

On a more practical level, you could extend your subtraction technique, for example by writing a script that uses ImageMagick to compute the difference between consecutive images, saving the result as a jpeg (or a compressed png if you want it lossless). You'll get one base image and a set of compressed "delta" images that should be much smaller. To compute the difference using ImageMagick:

convert image2.png image1.png -compose MinusSrc -composite -depth 24 -define png:compression-filter=2 -define png:compression-level=9 -define png:compression-strategy=1 difference-2-1.png

To re-compute by adding back:

convert image1.png difference-2-1.png -compose Plus -composite image2-reconstructed.png

(You can do the same using jpg instead and save a lot of space).

1
  • 1
    It seems that this doesn't manage overflows. I have some similar images with very different colors, and thus, I have some artefacts in the re-computed images, either by using Minus/Plus or Subtract/Add. Commented Oct 10, 2018 at 10:10
2

In the hopes other people looking to compress similar images/PNGs and finding their way here via searches:

I am not sure how the use case I worked on compare would apply to ops photographs as the link no longer works. My use case was similar but not the same - I was looking to compress computer program screenshots that are very similar, so potentially much more compressible than just zipping up the PNG files. I could find no solution through searching, so I came up with my own, and ended up with an insane 4.4% compression ratio (as opposed to 96% through naive use of simply compressing the PNGs):

My dataset were 300 PNG files at 1920x1080, with a raw size of 431.8mb which compressed down to just 417.4mb with the best settings I could find for bz2, 7z and similar tools. My understanding is that the source files were not ideally compressed on a PNG level, as various PNG minimizer tools managed to reduce the raw size from about 1.4mb to 900kb per file.

My thinking was that the problem was that the compression tools couldn't figure out that the data was already compressed, and that small changes in the raw data could lead to vastly different compressed files. So I decompressed the files using ffmpeg with settings which to my understanding do not result in any data loss:

for FILE in screenshot-2024*; do ffmpeg -loglevel error -i $FILE -vframes 1 -compression_algo raw -pix_fmt rgb24 $FILE.tiff; done

This increased the individual file sizes from 1.4 to 6mb, but compression with 7z/LZMA2 led to a resulting file sized an insanely low 19.175.127bytes, meaning a compression down to just 4.4% of the original size.

Reconverting the .tiff files to .png can be done with:

for FILE in screenshot-2024*.tiff; do ffmpeg -loglevel error -i $FILE $FILE.png; done

The duplicate file endings can of course be corrected, but this way it will not overwrite your original sources while testing.

The settings we used for compression were as following; with the Solid block size setting seemingly having the single biggest effect on the output size:

  • Compression level: 9/Ultra
  • Compression method: LZMA2
  • Dictionary size: 512 MB
  • Word size: 256
  • Solid Block size: 512 MB
  • Memory usage for Compressing: 12 GB

Since for us the goal was long term storage, the additional hoops and compression time were not a big factor, of course, your mileage may vary.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .