Questions tagged [ocr]

Ask Question

OCR (Optical character recognition) is the conversion of an image of characters into a machine-readable encoded text. Use this tag to indicate questions involving this type of conversion or software that performs OCR. When possible indicate the software you use, source and target of the conversion.

17 questions with no upvoted or accepted answers

4 votes

0 answers

191 views

Replace Scanned Text with OCRed Text in PDF

I have a scanned book as a PDF. When viewed in Evince, the book appears as it did when scanned, with old fashioned fonts that appear as they were scanned. However, Evince recognises the letters as ...

zhanmusi

asked Feb 24, 2019 at 0:39

2 votes

0 answers

41 views

OCR high res images & combine OCR data later, after image compression?

I have a large number of .tif's coming out of ScanTailor. Is there a way that I might OCR those .tif's with tesseract, holding the OCR data separate from the images; then compress the images, and ...

Diagon

asked Jul 7, 2023 at 22:50

2 votes

0 answers

222 views

MacOS-like OCR for Linux?

How can one setup the same ubiquitous OCR capabilities on Linux, in a manner similar to how one can copy text from any image in any software on MacOS and iOS? I am using EndevourOS with Gnome DE.

Pushp Vashisht

asked Apr 13, 2023 at 18:04

2 votes

0 answers

96 views

Is there software to manually OCR / teach OCR for handwriting (non-english) texts?

I had a problem that can't solve Tesseract/Abbyy Finereader etc - they can't recognize handwriting Russian as example. So I search OCR software for such things or a way to manually OCR my pdfs (...

PDD

asked Oct 15, 2021 at 4:19

2 votes

0 answers

264 views

Convert scanned pdf to pdf with text and images

Is it possible to convert scanned pdf to an normal pdf (i.e. same pdf as if it was created from a document (with formatted text and images)) ? I tried many OCR solutions online/offline but they tend ...

Jean Molinier

asked Dec 3, 2019 at 8:31

2 votes

0 answers

718 views

Extract hardcoded subtitles

I wanted to know if there is a way to extract hardcoded subtitles via OCR, should I do some image processing after extracting the frames in order to use tesseract afterwards? I have tried to extract ...

SkyBeast MC

asked Jul 17, 2018 at 23:38

2 votes

1 answer

699 views

Where I can get Tesseract binaries for Debian 6 64bit?

I used apt-get to install Tesseract but it's not really working. Maybe I could just download binaries somewhere, put in a dir and use this way? What's wrong with my Tesseract now: tesseract --help ...

buikoto

asked Jan 23, 2015 at 22:05

2 votes

0 answers

78 views

OCR that outputs probability data

I would like to convert printed books I own into audio by scanning them with OCR and then running the text through a TTS engine. These titles are not available as ebooks. Since OCR can make small ...

themirror

7,038

asked Sep 27, 2013 at 16:17

1 vote

0 answers

34 views

Can I transform colors of scanned pdf files and reduce the scan resolution to save memory keeping an existing text layer from OCR?

I have a pile of pdf files which have been scanned long ago and which are already searchable (i.e. they went through OCR). However the light level and contrast settings were not optimal. Is it ...

Adalbert Hanßen

asked Sep 14, 2022 at 19:19

1 vote

0 answers

395 views

Using tesseract for character recongniton, result is not as expected (much worse). How to get better?

I wanted to add output of Linux boot to my question and decided to try to use optical character recognition thinking now in 2022 surely there should be decent open source options (have not tried OCR ...

Martian2020

1,219

asked Jan 10, 2022 at 6:35

1 vote

0 answers

155 views

Where is ocrmypdf executable after Cygwin installation?

I followed this page to install OCRmyPDF on Cygwin. I did so from a non-administrator account, so the process ended up creating ~/.local/ for the required files. The following commands, however, do ...

user36800

asked Jan 10, 2021 at 20:01

0 votes

0 answers

52 views

What happened to Tesseract's "Math / equation detection module"?

I was able to get Tesseract to run via a Python script on my Windows machine to turn non-searchable PDFs into searchable ones. When downloading Tesseract onto windows, it asked me which languages I ...

Curious Layman

asked May 16 at 16:17

0 votes

0 answers

92 views

Making badly scanned public domain books legible with OCR

I've obtained soft copies of some very old public domain books. The illustrations are clear enough, but the text is somewhat blurry. I've experimented with Tesseract OCR and it can recognize a ...

YQ002lc2

asked Jul 24, 2023 at 6:11

0 votes

0 answers

47 views

How do I format texts that were processed by OCR?

Let's say that I want to connect all the paragraphs that are broken by the citations that start with (1), (2), (3), (4), (5). How would I express/automate this in bash? Keep in mind there are at most ...

Jean

asked Oct 1, 2022 at 12:48

0 votes

0 answers

91 views

NormCap OCR via Awesome Window Manager

One of the coolest programs I've come across recently, is an Optical Character Recognition (OCR) program called NormCap. I have it tied to a hot key, and anytime I want to copy un-highlightable text ...

Lonnie Best

5,185

asked Dec 25, 2021 at 23:46

15 30 50 per page

2 Next

Stack Exchange Network

Questions tagged [ocr]

Replace Scanned Text with OCRed Text in PDF

OCR high res images & combine OCR data later, after image compression?

MacOS-like OCR for Linux?

Is there software to manually OCR / teach OCR for handwriting (non-english) texts?

Convert scanned pdf to pdf with text and images

Extract hardcoded subtitles

Where I can get Tesseract binaries for Debian 6 64bit?

OCR that outputs probability data

Can I transform colors of scanned pdf files and reduce the scan resolution to save memory keeping an existing text layer from OCR?

Using tesseract for character recongniton, result is not as expected (much worse). How to get better?

Where is ocrmypdf executable after Cygwin installation?

What happened to Tesseract's "Math / equation detection module"?

Making badly scanned public domain books legible with OCR

How do I format texts that were processed by OCR?

NormCap OCR via Awesome Window Manager

Hot Network Questions

Questions tagged [ocr]

Related Tags