3

I am experimenting with using OpenCV via the Python 2.7 interface to implement a machine learning-based OCR application to parse text out of an image file. I am using this tutorial (I've reposted the code below for convenience). I am completely new to machine learning, and relatively new to OpenCV.

OCR of Hand-written Digits:

import numpy as np
import cv2
from matplotlib import pyplot as plt

img = cv2.imread('digits.png')
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)

# Now we split the image to 5000 cells, each 20x20 size
cells = [np.hsplit(row,100) for row in np.vsplit(gray,50)]

# Make it into a Numpy array. It size will be (50,100,20,20)
x = np.array(cells)

# Now we prepare train_data and test_data.
train = x[:,:50].reshape(-1,400).astype(np.float32) # Size = (2500,400)
test = x[:,50:100].reshape(-1,400).astype(np.float32) # Size = (2500,400)

# Create labels for train and test data
k = np.arange(10)
train_labels = np.repeat(k,250)[:,np.newaxis]
test_labels = train_labels.copy()

# Initiate kNN, train the data, then test it with test data for k=1
knn = cv2.KNearest()
knn.train(train,train_labels)
ret,result,neighbours,dist = knn.find_nearest(test,k=5)

# Now we check the accuracy of classification
# For that, compare the result with test_labels and check which are wrong
matches = result==test_labels
correct = np.count_nonzero(matches)
accuracy = correct*100.0/result.size
print accuracy

# save the data
np.savez('knn_data.npz',train=train, train_labels=train_labels)

# Now load the data
with np.load('knn_data.npz') as data:
    print data.files
    train = data['train']
    train_labels = data['train_labels']

OCR of English Alphabets:

import cv2
import numpy as np
import matplotlib.pyplot as plt

# Load the data, converters convert the letter to a number
data= np.loadtxt('letter-recognition.data', dtype= 'float32', delimiter = ',',
                    converters= {0: lambda ch: ord(ch)-ord('A')})

# split the data to two, 10000 each for train and test
train, test = np.vsplit(data,2)

# split trainData and testData to features and responses
responses, trainData = np.hsplit(train,[1])
labels, testData = np.hsplit(test,[1])

# Initiate the kNN, classify, measure accuracy.
knn = cv2.KNearest()
knn.train(trainData, responses)
ret, result, neighbours, dist = knn.find_nearest(testData, k=5)

correct = np.count_nonzero(result == labels)
accuracy = correct*100.0/10000
print accuracy

The 2nd code snippet (for the English alphabet) takes input from a .data file in the following format:

T,2,8,3,5,1,8,13,0,6,6,10,8,0,8,0,8
I,5,12,3,7,2,10,5,5,4,13,3,9,2,8,4,10
D,4,11,6,8,6,10,6,2,6,10,3,7,3,7,3,9
N,7,11,6,6,3,5,9,4,6,4,4,10,6,10,2,8
G,2,1,3,1,1,8,6,6,6,6,5,9,1,7,5,10
S,4,11,5,8,3,8,8,6,9,5,6,6,0,8,9,7
B,4,2,5,4,4,8,7,6,6,7,6,6,2,8,7,10

...there's about 20,000 lines of that. The data describes contours of characters.

I have a basic grasp on how this works, but I am confused as to how I can use this to actually perform OCR on an image. How can I use this code to write a function that takes a cv2 image as a parameter and returns a string representing the recognized text?

1 Answer 1

5

In general, machine-learning works like this: First you must train your program in understanding the domain of your problem. Then you start asking questions.

So if you are creating an OCR the first step is teaching your program what an A letter looks like, and the B and so on.

You use OpenCV to clear the image from noise and identify groups of pixels that could be letters and isolate them.

Then you feed those letters to your OCR program. On training mode, you will feed the image and explain what letter the image represents. On asking mode, you will feed the image and ask which letter it is. The better the training the more accurate is your answer will be (the program could get the letter wrong, there is always a chance of that).

8
  • I think you misunderstood my question; I get that that's how machine learning works. I don't quite understand the data structures in the example code from the tutorial. I actually figured it out for the first part (for numerical digits), but for the 2nd part (for the English alphabet), it reads contour data from a .data file, rather than analyzing an image. So, once I train it, how can I use it to parse text from an image? And from which data structure do I extract the characters?
    – Mat Jones
    Commented Nov 12, 2016 at 12:05
  • I've updated my question and included an excerpt of the letter recognition training .data file.
    – Mat Jones
    Commented Nov 12, 2016 at 12:08
  • Sorry for my confusion. This .data file is a collection of letters already processed. You can find out what the numbers are by this link archive.ics.uci.edu/ml/datasets/Letter+Recognition . In plain English, the used something like openCV to isolate the letters and took measurements of the letters. You could take a new letter and take the same measurements and then compare it with this data set to find out which letter is. The tutorial don't cover this :/
    – Rodrigo
    Commented Nov 12, 2016 at 12:11
  • >"The tutorial don't cover this :/" That's not what I want to hear haha any suggestions on how to go about this?
    – Mat Jones
    Commented Nov 12, 2016 at 12:16
  • 1
    @mjones.udri I am also facing the same doubt. Any update on how to do? Have u received the question of your answer? Commented Jul 29, 2017 at 9:26

Not the answer you're looking for? Browse other questions tagged or ask your own question.