I am working on a project where I have to read the document from an image. In initial stage I will read the machine printed documents and then eventually move to handwritten document's image. However I am doing this for learning purpose, so I don't intend to use apis like Tesseract etc. I intend to do in steps:
Preprocessing(Blurring, Thresholding, Erosion&Dilation)
Character Segmentation
OCR (or ICR in later stages)
So I am doing the character segmentation right now, I recently did it through the Horizontal and Vertical Histogram. I was not able to get very good results for some of the fonts, like the image as shown I was not able to get good results.
Is there any other method or algorithm to do the same? Any help will be appreciated!
Edit 1:
The result I got after detecting blobs using cv2.SimpleBlobDetector.