Questions tagged [ocr]
OCR (Optical character recognition) is the conversion of an image of characters into a machine-readable encoded text. Use this tag to indicate questions involving this type of conversion or software that performs OCR. When possible indicate the software you use, source and target of the conversion.
39
questions
2
votes
2
answers
1k
views
Create custom wordlist
I want to create a custom list of (scientific) words for purposes like spell checking and OCR based on my collection of scientific papers in pdf format. Using pdftotext I can easily create a text file ...
7
votes
2
answers
3k
views
How to find all images containing any text?
I got a lot of images, and I need to find, which of them contain any text in English (to delete them). Is it possible to do it automatically?
4
votes
1
answer
194
views
De-obfuscate a picture with statistical information?
I need to get this kind of information into numbers, how?
Perhaps related
https://dsp.stackexchange.com/questions/1054/how-do-i-recover-the-signal-from-an-ecg-image
https://dsp.stackexchange.com/...
4
votes
3
answers
340
views
sed one-liner to replace word-medial capitals
I used OCR to turn some scans into plaintext, but unfortunately the letters 'fi' which are commonly joined in some fonts, got read in as capital W's. Now I need to replace all the W's with 'fi', and ...
0
votes
1
answer
363
views
Image (having text-and-numbers) to text-file matching [:alnum:] nicely with some Unix -tool?
Suppose a photograph with text and numbers. I want to manage it in my editor with tools such as grep, standard text-processing things such as Vim's block-highlighting and also more advanced things ...
3
votes
1
answer
1k
views
Linux equivalent of GraphClick?
Is there a piece of Linux software that does what GraphClick does in Mac OS X?
That is, is there a Linux software that "is a graph digitizer software which allows to automatically retrieve the ...
0
votes
1
answer
67
views
Writing to picture which is scanned document
I have a scanned contract and I need to change only a few names and dates in the contract.
It's easy to scan the document but impossible to ocr the document and open in *.doc format.
Is there an ...
49
votes
6
answers
35k
views
Is there some sort of PDF-to-text converter?
I need PDF files in text so I can search over them in bulk from commandline. Is there some converter for Ubuntu, OBSD or similar distro?
Perhaps related post, OCR with Ubuntu here.
15
votes
5
answers
7k
views
OCR on Linux systems [closed]
I have always found OCR technology to be behind on open source systems. I've also watched the Ocropus project since its infancy. I've tried what I've heard is the best OCR engine available for Linux,...