0

I have an heavy scanned pdf with OCR. I was able to reduce its size by half with ghostscript win64, with this command (as recommended in this answer):

gswin64 -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/screen -dNOPAUSE -dQUIET -dBATCH -sOutputFile=output.pdf input.pdf

But still it's not light enough.

I could also convert the pdf with Calibre or pdftotext from xpdf but I am loosing the layout.

Is there a way to extract the OCR keeping the exact position of each text on each page while removing the scanned image?

1 Answer 1

0

Have you tried ocrmypdf with JBIG2?

ocrmypdf --optimize 3 --jbig2-lossy in.pdf out.pdf

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .