9

I'm writing an OCR application to read characters from a screenshot image. Currently, I'm focusing only on digits. I'm partially basing my approach on this blog post: http://blog.damiles.com/2008/11/basic-ocr-in-opencv/.

I can successfully extract each individual character using some clever thresholding. Where things get a bit tricky is matching the characters. Even with fixed font face and size, there are some variables such as background color and kerning that cause the same digit to appear in slightly different shapes. For example, the below image is segmented into 3 parts:

  1. Top: a target digit that I successfully extracted from a screenshot
  2. Middle: the template: a digit from my training set
  3. Bottom: the error (absolute difference) between the top and middle images

The parts have all been scaled (the distance between the two green horizontal lines represents one pixel).

topbottommiddle

You can see that despite both the top and middle images clearly representing a 2, the error between them is quite high. This causes false positives when matching other digits -- for example, it's not hard to see how a well-placed 7 can match the target digit in the image above better than the middle image can.

Currently, I'm handling this by having a heap of training images for each digit, and matching the target digit against those images, one-by-one. I tried taking the average image of the training set, but that doesn't resolve the problem (false positives on other digits).

I'm a bit reluctant to perform matching using a shifted template (it'd be essentially the same as what I'm doing now). Is there a better way to compare the two images than simple absolute difference? I was thinking of maybe something like the EMD (earth movers distance, http://en.wikipedia.org/wiki/Earth_mover's_distance) in 2D: basically, I need a comparison method that isn't as sensitive to global shifting and small local changes (pixels next to a white pixel becoming white, or pixels next to a black pixel becoming black), but is sensitive to global changes (black pixels that are nowhere near white pixels become black, and vice versa).

Can anybody suggest a more effective matching method than absolute difference?

I'm doing all this in OpenCV using the C-style Python wrappers (import cv).

3 Answers 3

7

I would look into using Haar cascades. I've used them for face detection/head tracking, and it seems like you could build up a pretty good set of cascades with enough '2's, '3's, '4's, and so on.

http://alereimondo.no-ip.org/OpenCV/34

http://en.wikipedia.org/wiki/Haar-like_features

3
  • Thanks! I'll definitely have a look at Haar cascades. How efficient do you think it will be compared to simple image subtraction? I expect it to be slower. However, if it's 5 times slower but replaces 10 image checks for the same effectiveness, then it would definitely be worth it.
    – mpenkov
    Commented Jan 2, 2012 at 5:03
  • You'll have to generate cascades, which is a pretty time-consuming process(but also stupidly parallelizeable). It also requires a bunch of input data( I would use the numbers in every font you have on a desktop).
    – rsaxvc
    Commented Jan 2, 2012 at 5:17
  • Slower than subtraction, but you can search an image for all instances of a certain cascade at once.
    – rsaxvc
    Commented Jan 2, 2012 at 5:19
3

OCR on noisy images is not easy - so simple approaches no not work well.

So, I would recommend you to use HOG to extract features and SVM to classify. HOG seems to be one of the most powerful ways to describe shapes.

The whole processing pipeline is implemented in OpenCV, however I do not know the function names in python wrappers. You should be able to train with the latest haartraining.cpp - it actually supports more than haar - HOG and LBP also.

And I think the latest code (from trunk) is much improved over the official release (2.3.1).

HOG usually needs just a fraction of the training data used by other recognition methods, however, if you want to classify shapes that are partially ocludded (or missing), you should make sure you include some such shapes in training.

1
  • I wouldn't call the images noisy per se, but I see where you're coming from. I'll have a look at HOG. Thanks.
    – mpenkov
    Commented Jan 2, 2012 at 7:53
3

I can tell you from my experience and from reading several papers on character classification, that a good way to start is by reading about Principal Component Analysis (PCA), Fisher's Linear Discriminant Analysis (LDA), and Support Vector Machines (SVMs). These are classification methods that are extremely useful for OCR, and it turns out that OpenCV already includes excellent implementations on PCAs and SVMs. I haven't seen any OpenCV code examples for OCR, but you can use some modified version of face classification to perform character classification. An excellent resource for face recognition code for OpenCV is this website.

Another library for Python that I recommend you is "scikits.learn". It is very easy to send cvArrays to scikits.learn and run machine learning algorithms on your data. A basic example for OCR using SVM is here.

Another more complicated example using manifold learning for handwritten character recognition is here.

Not the answer you're looking for? Browse other questions tagged or ask your own question.