Designing an open source OCR engine specifically for rendered text (screenshots)

Question

So my current personal project is to be able to automatically grab screenshots out of a game, OCR the text, and count the number of occurrences of given words.

Having spent all evening looking around at different OCR solutions, I've come to realize that the majority of OCR packages out there are designed for scanned text. If there are any packages that can read screen text reliably, they're well outside this hobbyist's budget.

I've been reading through some other questions, and the closest I found was OCR engines designed for screen-reading.

It seems to me that reading rendered text should be much easier than printed and scanned text. Lines are always straight, and any given letter will always appear with the exact same pixel representation (mostly, anyways). Also, why not use the actual font file (if you have it) as a cheat sheet to recognizing characters? We might actually reach 100% accuracy with a system like this.

Assuming you have the font file for a cheat sheet and your source image is perfectly square and has no noise, how would you go about recognizing characters from the screen?

(Problems I can foresee are ui lines and images that could confuse any crude attempt at pixel-guessing.)

If you already know of a free/open-source OCR package designed for screen-reading, please let me know. I kind of doubt that's going to show up though, as no other askers seem to have gotten a lead either.

A Python interface is preferred, but beggars can't be choosers.

EDIT:
To clarify, I'm looking for design suggestions for an OCR solution that is specifically designed to read text from screenshots. Popular tools like tesseract (mentioned in the question I linked) are hard to use at best because they are not designed for this kind of source file.

My boss once coined a term I like for this -- Obvious Character Recognition. — MK., Commented Dec 27, 2010 at 5:50
Hah! I like that term, especially because it applies. It's a shame that it collides with the other acronym, or I'd use it for this. — Hovis Biddle, Commented Dec 27, 2010 at 5:54
Hi @Hovis , did you got this? Do you have a link to your open source project? — Fernando Freitas Alves, Commented Apr 11, 2014 at 17:23

Hovis Biddle · Accepted Answer · 2010-12-31 21:04:44Z

So I've been thinking about it and I feel that the best approach will be to count the number of pixels in each blob/glyph/character. This should really cut down on the number of tests I need to do to differentiate between glyphs.

Regretfully, I'll have to be very specific about fonts. The software will only be able to recognize fonts at the right dpi, for the right font face and weight, etc.

It isn't ideal, and I'd still like to see someone who knows more about this stuff design OCR for rendered text; but it will work for my limited case.

Michael Aaron Safyan · Accepted Answer · 2010-12-27 05:35:34Z

1

If your goal is to count occurrences of certain events in a game, OCR is really not the right way to be going about it. That said, if you are determined to use OCR, then tesseract-OCR is a well-known open source package for performing optical character recognition. I'm not really sure what you are getting at with respect to scanned vs. rendered text, but tesseract will probably do as good a job as any opensource package that is available. OCR is still a tricky art, so I wouldn't expect 100% accuracy.

answered Dec 27, 2010 at 5:35

Michael Aaron Safyan

94.9k16 gold badges139 silver badges200 bronze badges

I've been trying to use tesseract all morning, and it's a no-go. It suffers from the same problem of being designed for "large" scanned text. (I.E. high-dpi but probably messy text)
– Hovis Biddle
Commented Dec 27, 2010 at 5:51

Add a comment |

Dustin Wyatt · Accepted Answer · 2019-04-07 02:57:25Z

0

This isn't exactly what you want, but you may want to look at Sikuli.

edited Apr 7, 2019 at 2:57

answered Dec 28, 2010 at 22:06

Dustin Wyatt

4,1645 gold badges36 silver badges60 bronze badges

Hmm, this looks really cool. It really isn't what I'm after but I'll probably end up playing with it. Thanks!
– Hovis Biddle
Commented Dec 29, 2010 at 19:24
For future reference, that bit.ly link actually goes to sikuli.org. Why use a link shortener in the first place?
– SilverWolf
Commented Apr 6, 2019 at 21:58
2

@SilverWolf Who knows, I wrote this answer almost a decade ago!
– Dustin Wyatt
Commented Apr 7, 2019 at 2:56

Add a comment |

Collectives™ on Stack Overflow

Designing an open source OCR engine specifically for rendered text (screenshots)

3 Answers 3

Not the answer you're looking for? Browse other questions tagged
python
screen-scraping
ocr
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Not the answer you're looking for? Browse other questions tagged pythonscreen-scrapingocr or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
python
screen-scraping
ocr
or ask your own question.