Why do I not succeed into making this simple pdf → raster image → pdf round-trip file-size-stable?
$ # Get original file (8KB, just extracted from my scanner).
$ curl "https://nextcloud.mbb.cnrs.fr/nextcloud/s/Rgd4qgmt5mGdifR/download?file=scan.pdf" -o scan.pdf
$ du scan.pdf
8 scan.pdf
$ # Extract image from the file.
$ pdfimages -list scan.pdf
page num type width height color comp bpc enc interp object ID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------------
1 0 image 1653 2338 gray 1 8 jpeg no 6 0 200 200 7012B 0.2%
$ pdfimages -all scan.pdf extract
$ du extract-000.jpg
56 extract-000.jpg # Much bigger than inside.
$ # Convert back into a pdf.
$ magick extract-000.jpg back.pdf
$ du back.pdf
32 back.pdf # Much larger than the original.
$ pdfimages -list back.pdf
page num type width height color comp bpc enc interp object ID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------------
1 0 image 1653 2338 gray 1 8 jpeg no 8 0 200 200 28.2K 0.7%
What's happening? What are the determinants of file size in these three files?
scan.pdf
: the original (extracted from a scanner machine) (8KB).extract-000.jpeg
: extract with vanillapdfimages
command (56KB).back.pdf
: vanilla convert withmagick
(32KB).
Can I control them? Can I make back.pdf
the same size as scan.pdf
without loosing image quality?
(the ultimate goal is to crop the image before getting back to pdf, and my question originates from my attempts to cropping surprisingly increasing the cropped result size instead of decreasing it)