Skip to main content

Questions tagged [ocr]

OCR (Optical character recognition) is the conversion of an image of characters into a machine-readable encoded text. Use this tag to indicate questions involving this type of conversion or software that performs OCR. When possible indicate the software you use, source and target of the conversion.

2 votes
2 answers
1k views

Create custom wordlist

I want to create a custom list of (scientific) words for purposes like spell checking and OCR based on my collection of scientific papers in pdf format. Using pdftotext I can easily create a text file ...
highsciguy's user avatar
  • 2,574
7 votes
2 answers
3k views

How to find all images containing any text?

I got a lot of images, and I need to find, which of them contain any text in English (to delete them). Is it possible to do it automatically?
Andrey Chetverikov's user avatar
4 votes
1 answer
194 views

De-obfuscate a picture with statistical information?

I need to get this kind of information into numbers, how? Perhaps related https://dsp.stackexchange.com/questions/1054/how-do-i-recover-the-signal-from-an-ecg-image https://dsp.stackexchange.com/...
user avatar
4 votes
3 answers
340 views

sed one-liner to replace word-medial capitals

I used OCR to turn some scans into plaintext, but unfortunately the letters 'fi' which are commonly joined in some fonts, got read in as capital W's. Now I need to replace all the W's with 'fi', and ...
ixtmixilix's user avatar
  • 13.3k
0 votes
1 answer
363 views

Image (having text-and-numbers) to text-file matching [:alnum:] nicely with some Unix -tool?

Suppose a photograph with text and numbers. I want to manage it in my editor with tools such as grep, standard text-processing things such as Vim's block-highlighting and also more advanced things ...
user avatar
3 votes
1 answer
1k views

Linux equivalent of GraphClick?

Is there a piece of Linux software that does what GraphClick does in Mac OS X? That is, is there a Linux software that "is a graph digitizer software which allows to automatically retrieve the ...
hpy's user avatar
  • 4,587
0 votes
1 answer
67 views

Writing to picture which is scanned document

I have a scanned contract and I need to change only a few names and dates in the contract. It's easy to scan the document but impossible to ocr the document and open in *.doc format. Is there an ...
xralf's user avatar
  • 15.2k
49 votes
6 answers
35k views

Is there some sort of PDF-to-text converter?

I need PDF files in text so I can search over them in bulk from commandline. Is there some converter for Ubuntu, OBSD or similar distro? Perhaps related post, OCR with Ubuntu here.
otto's user avatar
  • 591
15 votes
5 answers
7k views

OCR on Linux systems [closed]

I have always found OCR technology to be behind on open source systems. I've also watched the Ocropus project since its infancy. I've tried what I've heard is the best OCR engine available for Linux,...
jjclarkson's user avatar
  • 2,147

15 30 50 per page
1 2
3