Reducing heavy scanned PDF(keeping only OCR and removing scanned image)

Question

I have an heavy scanned pdf with OCR. I was able to reduce its size by half with ghostscript win64, with this command (as recommended in this answer):

gswin64 -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/screen -dNOPAUSE -dQUIET -dBATCH -sOutputFile=output.pdf input.pdf

But still it's not light enough.

I could also convert the pdf with Calibre or pdftotext from xpdf but I am loosing the layout.

Is there a way to extract the OCR keeping the exact position of each text on each page while removing the scanned image?

André Levy · Accepted Answer · 2019-07-05 12:23:25Z

0

Have you tried ocrmypdf with JBIG2?

ocrmypdf --optimize 3 --jbig2-lossy in.pdf out.pdf

answered Jul 5, 2019 at 12:23

André Levy

6441 gold badge6 silver badges15 bronze badges

Add a comment |

Stack Exchange Network

Reducing heavy scanned PDF(keeping only OCR and removing scanned image)

1 Answer 1

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
pdf
conversion
.

Hot Network Questions

Reducing heavy scanned PDF(keeping only OCR and removing scanned image)

1 Answer 1

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged pdfconversion.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
pdf
conversion
.