Skip to main content

Questions tagged [ocr]

OCR (Optical character recognition) is the conversion of an image of characters into a machine-readable encoded text. Use this tag to indicate questions involving this type of conversion or software that performs OCR. When possible indicate the software you use, source and target of the conversion.

0 votes
0 answers
48 views

What happened to Tesseract's "Math / equation detection module"?

I was able to get Tesseract to run via a Python script on my Windows machine to turn non-searchable PDFs into searchable ones. When downloading Tesseract onto windows, it asked me which languages I ...
Curious Layman's user avatar
0 votes
0 answers
90 views

Making badly scanned public domain books legible with OCR

I've obtained soft copies of some very old public domain books. The illustrations are clear enough, but the text is somewhat blurry. I've experimented with Tesseract OCR and it can recognize a ...
YQ002lc2's user avatar
2 votes
0 answers
39 views

OCR high res images & combine OCR data later, after image compression?

I have a large number of .tif's coming out of ScanTailor. Is there a way that I might OCR those .tif's with tesseract, holding the OCR data separate from the images; then compress the images, and ...
Diagon's user avatar
  • 680
1 vote
2 answers
373 views

How to scan with ocr bash script

To streamline the scan process I intend to create a script that scans and applies OCR in one step. However my bash skills are rather poor, so I would be very thankful for a bit of help. Here my ...
alex's user avatar
  • 993
2 votes
0 answers
220 views

MacOS-like OCR for Linux?

How can one setup the same ubiquitous OCR capabilities on Linux, in a manner similar to how one can copy text from any image in any software on MacOS and iOS? I am using EndevourOS with Gnome DE.
Pushp Vashisht's user avatar
0 votes
1 answer
145 views

Make (`ocrmypdf`) command run in terminal AND include input name in that of the output

I have this line inside a Dolphin service-menu file that contains many other commands for PDF processing: Exec=bash -c 'f="%u"; ocrmypdf "$f" "${f%.pdf}_ocr.pdf";' It ...
cipricus's user avatar
  • 1,629
0 votes
1 answer
399 views

Best command-line OCR software for recognizing typed text over colorful background

I need to extract text from images like the one below: As you can see, the text is typed not handwritten. Moreover, the background is colorful. I've tried Tesseract OCR, and while it works some of ...
user avatar
0 votes
0 answers
47 views

How do I format texts that were processed by OCR?

Let's say that I want to connect all the paragraphs that are broken by the citations that start with (1), (2), (3), (4), (5). How would I express/automate this in bash? Keep in mind there are at most ...
Jean's user avatar
  • 1
1 vote
0 answers
34 views

Can I transform colors of scanned pdf files and reduce the scan resolution to save memory keeping an existing text layer from OCR?

I have a pile of pdf files which have been scanned long ago and which are already searchable (i.e. they went through OCR). However the light level and contrast settings were not optimal. Is it ...
Adalbert Hanßen's user avatar
1 vote
0 answers
380 views

Using tesseract for character recongniton, result is not as expected (much worse). How to get better?

I wanted to add output of Linux boot to my question and decided to try to use optical character recognition thinking now in 2022 surely there should be decent open source options (have not tried OCR ...
Martian2020's user avatar
  • 1,219
0 votes
0 answers
91 views

NormCap OCR via Awesome Window Manager

One of the coolest programs I've come across recently, is an Optical Character Recognition (OCR) program called NormCap. I have it tied to a hot key, and anytime I want to copy un-highlightable text ...
Lonnie Best's user avatar
  • 5,175
2 votes
0 answers
96 views

Is there software to manually OCR / teach OCR for handwriting (non-english) texts?

I had a problem that can't solve Tesseract/Abbyy Finereader etc - they can't recognize handwriting Russian as example. So I search OCR software for such things or a way to manually OCR my pdfs (...
PDD's user avatar
  • 21
0 votes
0 answers
582 views

How to specify multiple input files for Tesseract when using the output PDF option (only works with 'parallel' on the command line)

I am trying to tesseract all files in a directory to a pdf: This command works fine: ls * | parallel -j 4 tesseract {} {.} pdf And produces a pdf for each input file. However, I am unable to get it ...
user avatar
5 votes
1 answer
272 views

Find PDFs that don't have text

I have many folders with lots of PDFs and I want to Optical Character Recognise those that do not have a text layer. So first, I want to find them. I thought that maybe a pipe with pdfgrep would do ...
fich's user avatar
  • 330
1 vote
0 answers
154 views

Where is ocrmypdf executable after Cygwin installation?

I followed this page to install OCRmyPDF on Cygwin. I did so from a non-administrator account, so the process ended up creating ~/.local/ for the required files. The following commands, however, do ...
user36800's user avatar
  • 111

15 30 50 per page