I have a large number of .tif's coming out of ScanTailor. Is there a way that I might OCR those .tif's with tesseract, holding the OCR data separate from the images; then compress the images, and finally combine the OCR data with the compressed images?

The point is that I don't want to compress before I OCR, and the tools for compressing the pdf's later, preserving the OCR, are not great.


You must log in to answer this question.

Browse other questions tagged .