OCR with non-language text

Question

I am interested in using OCR to recognize text from a document that doesn't contain words. Rather, it is a document with a long string of "random" printed characters. I have been trying to use tesseract to scan the text, but it seems to be looking for words. Is there a way to tell tesseract to just do plain character recognition?

The old Presto! PageManager that came with the scanner, did not do spellchecking by default (windows), it has spell checker but post OCR. I wonder if you can dissapear the dictionary on any software doing auto correction, it could not do it then. The OCR is not by default looking at whole words, except mabey for alignment. — Psycogeek, Commented Aug 28, 2013 at 17:04

nguyenq · Accepted Answer · 2013-10-08 01:17:23Z

4

Yes, you can disable the dictionaries by defining a configuration file containing:

load_system_dawg F
load_freq_dawg F

and specify it with the command.

answered Oct 8, 2013 at 1:17

nguyenq

1662 bronze badges

This does appear to do what I wanted. Sadly, the results aren't much better for the text that I was working with, but it does answer the question. Thanks!
– Daniel
Commented Oct 8, 2013 at 17:46

Add a comment |

Martin Monperrus · Accepted Answer · 2020-04-25 10:30:20Z

1

Tesseract does not work well because it expects words and natural language.

For your use case, I've had success with gocr.

I can decode 15k of random characters with 100% accuracy, see https://www.monperrus.net/martin/store-data-paper

answered Apr 25, 2020 at 10:30

Martin Monperrus

3,0633 gold badges19 silver badges21 bronze badges

Your post assumed you've printed the text and you can influence the alphabet used. There isn't such assumption in the question.
– Máté Juhász
Commented Apr 25, 2020 at 11:32

Add a comment |

Stack Exchange Network

OCR with non-language text

2 Answers 2

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
ocr
tesseract-ocr
.

Hot Network Questions

OCR with non-language text

2 Answers 2

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged ocrtesseract-ocr.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
ocr
tesseract-ocr
.