I have about 3000 small images of single words that I am trying to convert to text. I have installed tesseract on my windows 7 machine using the installer and successfully managed to OCR images throught cmd and powershell.
tesseract.exe imagename.png imagename
produces a text file with the converted text.
The results I got were terrible with only about 40% of characters successfully converted. I would like to improve the results.
Does anyone know what the optional configurations that can be given in this command? The required arguments are:
tesseract imagename outputbase [- lang] [configfile [+|-]varfile]...]
Also could someone describe the training procedure, I am finding it hard to understand the documentation. I know that my text is in times new roman. Do I need to train it for TNR or is that already built in and/or is it possible to download files that allows tesseract to recognize it?