6

So I'm trying to create a program that can see what number an image is and print the integer in the console. (I'm using python 3)

For example that the program recognizes that the following image (an actual image the program has to check) is number 2:

number 2

I've tried to just compare it with an other image with the 2 in it with cv2.matchTemplate() but each time the blue pixels rgb values are a little bit different for each image and the image could be a bit larger or smaller. for example the following image:

number 2

It also has to recognize it apart from al the other blue number images (0-9), for example the following one:

number 5

I've tried mulitple match template codes, and make a folder with number 0-9 images as templates, but each time almost every single number is recognized in the number that needs to be recognized. for example number 5 gets recognized in an image that is number 2. And if its doesnt recognize all of them, it recognizes the wrong one(s).

The ones I've tried:

but like I said before it comes with those problems.

I've also tried to see how much percentage blue is in each image, but those numbers were to close to tell the numbers appart by seeing how much blue was in them.

Does anyone have a solution? Am I being stupid for using cv2.matchTemplate() and is there a much simpler option? (I don't mind using a library for it, because this is part of a bigger piece of code, but I prefer to code it, instead of libraries)

1
  • I think this is too broad/vague, and probably a poor fit for Stack Overflow.
    – AMC
    Commented Jan 17, 2020 at 22:42

3 Answers 3

4

Instead of using Template Matching, a better approach is to use Pytesseract OCR to read the number with image_to_string(). But before performing OCR, you need to preprocess the image. For optimal OCR performance, the preprocessed image should have the desired text/number/characters to OCR in black with the background in white. A simple preprocessing step is to convert the image to grayscale, Otsu's threshold to obtain a binary image, then invert the image. Here's a visualization of the preprocessing step:

Input image -> Grayscale -> Otsu's threshold -> Inverted image ready for OCR

enter image description here enter image description here enter image description here enter image description here

Result from Pytesseract OCR

2

Here's the results with the other images:

enter image description here enter image description here enter image description here enter image description here

2

enter image description here enter image description here enter image description here enter image description here

5

We use the --psm 6 configuration option to assume a single uniform block of text. See here for more configuration options.

Code

import cv2
import pytesseract

pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"

# Load image, grayscale, Otsu's threshold, then invert
image = cv2.imread('1.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
invert = 255 - thresh

# Perfrom OCR with Pytesseract
data = pytesseract.image_to_string(invert, lang='eng', config='--psm 6')
print(data)

cv2.imshow('thresh', thresh)
cv2.imshow('invert', invert)
cv2.waitKey()

Note: If you insist on using Template Matching, you need to use scale variant template matching. Take a look at how to isolate everything inside of a contour, scale it, and test the similarity to an image? and Python OpenCV line detection to detect X symbol in image for some examples. If you know for certain that your images are blue, then another approach would be to use color thresholding with cv2.inRange() to obtain a binary mask image then apply OCR on the image.

16
  • So I've tried your method, but there are still some numbers it wont pick up. for example it mistakes this 5 for a -: imgur.com/a/a1f5PSs and these are the threshed and inverted images: imgur.com/a/S4QwAg7. Is there a way to make it recognize that?
    – kaci
    Commented Jan 17, 2020 at 22:08
  • I'm getting 5 printed to the console with that input image. Are you loading the right image?
    – nathancy
    Commented Jan 17, 2020 at 22:28
  • I'm sure I'm loading the right image. Are you using the exact code as in your anwser or could it be that something in the picture changed when I uploaded it to imgur.com (I'm using the exact code in your awnser except for the cv2.imshow() and cv2.waitKey(), but that shouldn't matter)? EDIT: I just downloaded the imgur image and it still gave me the -.
    – kaci
    Commented Jan 17, 2020 at 22:37
  • Yes I'm using the exact code, I simply downloaded the image (its a .png image) and get 5 every time. The cv2.imshow() and cv2.waitKey() are only for display purposes and shouldn't affect the console output. I'm using windows 10, Python: 3.7.4, NumPy: 1.14.5, OpenCV: 4.1.0, pytesseract: 0.2.7
    – nathancy
    Commented Jan 17, 2020 at 22:41
  • Im also using windows 10, Python 3.7.4, but I'm using numpy==1.18.1, opencv-python==4.1.2.30, pytesseract==0.3.1 wich are different from yours
    – kaci
    Commented Jan 17, 2020 at 22:44
2

Given the lovely regular input, I expect that all you need is simple comparison to templates. Since you neglected to supply your code and output, it's hard to tell what might have gone wrong.

Very simply ...

  • Rescale your input to the size or your templates.
  • Calculate any straightforward matching evaluation on the input with each of the 10 templates. A simply matching count should suffice: how many pixels match between the two images.
  • The template with the highest score is the identification.

You might also want to set a lower threshold for declaring a match, perhaps based on how well that template matches each of the other templates: any identification has to clearly exceed the match between two different templates.

0

If you don't have access to an OCR engine, just know you can build your own OCR system via a KNN classifier. In this example, the implementation should not be very difficult, as you are only classifying numbers. OpenCV provides a very straightforward implementation of KNN.

The classifier is trained using features calculated from samples from known instances of classes. In this case, you have 10 classes (if you are working with digits 0 - 9), so you can prepare a "template" with your digits, extract some features, train the classifier and use it to classify new instances.

All can be done in OpenCV without the need of extra libraries and the KNN (for this kind of application) has a more than acceptable accuracy rate.

Not the answer you're looking for? Browse other questions tagged or ask your own question.