1

I have a pile of pdf files which have been scanned long ago and which are already searchable (i.e. they went through OCR).

However the light level and contrast settings were not optimal.

Is it possible to reduce the bits per pixel of the existing files to some reasonable low level in order to save storage space (make color-curve transformations, posterize or even binarize to black and white like in Gimp or other image-manipulation programs)?

The files are scanned with 600 dpi and already searchable, i.e. in addition to the scanned image there is a text layer. Probably the scan resolution had been chosen so high in oder to obtain better OCR results. But it makes them excessively large. I think, a scan with 200 dpi would have created good visual quality with much less memory requirements. I want to maintain the OCR generated text layer with its good OCR quality. What are the proper command?

3
  • Try with ps2pdf -dPDFSETTINGS=/ebook in.pdf out.pdf to reduce the dpi to 150. It is part of ghostscript.
    – meuh
    Commented Sep 15, 2022 at 18:17
  • I knew about the option /ebook. It cut the memory to a bit less than a half. I would expect additional storage saving from limiting the color space (and possibly improving the contrast in the same step switching from gray on shadow to black on white, i.e. using color transformations as in Gimp's contrast curve or even binarization. Commented Sep 16, 2022 at 21:03
  • Try this answer from sister site askubuntu. There are other answers there too. If one works for you, please post it here as an answer.
    – meuh
    Commented Sep 17, 2022 at 7:27

0

You must log in to answer this question.

Browse other questions tagged .