6

Possible Duplicate:
How to extract text with OCR from a PDF on Linux?

I have a few documents in English and Hebrew that I scanned in and converted to PDF format.

Is there some free or cheap utility that can process a scanned PDF and do OCR, at least in English, preferably also in Hebrew?

Thanks!

4
  • A couple of similar questions. superuser.com/questions/28426/… superuser.com/questions/64124/… superuser.com/questions/97470/…
    – heavyd
    Commented Feb 16, 2010 at 16:47
  • 6
    The author of this question did not specify that he is running Linux. The so-called possible duplicate question is too localized, and may not apply at all to the author of this question.
    – eleven81
    Commented Feb 16, 2010 at 17:03
  • 3
    @eleven81 - Correct, I was asking for Windows.
    – Shaul Behr
    Commented Jul 4, 2010 at 8:34
  • Not only this is not duplicate - it's still unanswered. All 3 answers only yields into text extracts and not a PDF text-selectable document.
    – cregox
    Commented Jun 28, 2013 at 16:05

3 Answers 3

1

I found a list of free OCR software for Windows.

  1. FreeOCR
  2. Tesseract
  3. WeOcr Tesseract Web Interface
  4. GOCR
  5. Windows GUI for GOCR
  6. OCR Desktop
  7. Simple OCR
  8. TopOCR

However, these programs need an image input, not a PDF input. For this, try a PDF-to-JPG converter.

1

I found an interesting idea that lets Google do all the work of OCR'ing the PDF files for you.

1
0

Personally, I would use Ghostview to convert them to an image, then Tesseract to convert them to text. This is a totally free, open source, cross platform solution that I have had very good results with when trying to convert plain text. I don't use it for complex documents with tables and such, but for plain text you can't beat the price.

Not the answer you're looking for? Browse other questions tagged .