3

I have about 3000 small images of single words that I am trying to convert to text. I have installed tesseract on my windows 7 machine using the installer and successfully managed to OCR images throught cmd and powershell.

 tesseract.exe imagename.png imagename 

produces a text file with the converted text.

The results I got were terrible with only about 40% of characters successfully converted. I would like to improve the results.

Does anyone know what the optional configurations that can be given in this command? The required arguments are:

tesseract imagename outputbase [- lang] [configfile [+|-]varfile]...]

Also could someone describe the training procedure, I am finding it hard to understand the documentation. I know that my text is in times new roman. Do I need to train it for TNR or is that already built in and/or is it possible to download files that allows tesseract to recognize it?

2

1 Answer 1

0

One way to remove the results is to preprocess them like remove any skew and thresholding them. You can use open CV. Later you can train the text

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .