Questions tagged [ocr]

Ask Question

OCR (Optical character recognition) is the conversion of an image of characters into a machine-readable encoded text. Use this tag to indicate questions involving this type of conversion or software that performs OCR. When possible indicate the software you use, source and target of the conversion.

39 questions

0 votes

0 answers

52 views

What happened to Tesseract's "Math / equation detection module"?

I was able to get Tesseract to run via a Python script on my Windows machine to turn non-searchable PDFs into searchable ones. When downloading Tesseract onto windows, it asked me which languages I ...

Archemar

31.7k

modified May 21 at 9:06

49 votes

6 answers

35k views

Is there some sort of PDF-to-text converter?

I need PDF files in text so I can search over them in bulk from commandline. Is there some converter for Ubuntu, OBSD or similar distro? Perhaps related post, OCR with Ubuntu here.

dicktyr

answered Mar 10 at 23:23

8 votes

4 answers

5k views

How can I rasterize all of the text in a PDF?

You know when you have a pdf, which is a scan of a document and it's a really huge file, because it just stores the picture of the scanned document? And there are OCR tools which can help you to ...

user202729

modified Feb 18 at 13:40

1 vote

2 answers

378 views

How to scan with ocr bash script

To streamline the scan process I intend to create a script that scans and applies OCR in one step. However my bash skills are rather poor, so I would be very thankful for a bit of help. Here my ...

Kusalananda♦

339k

modified Feb 13 at 9:30

4 votes

1 answer

2k views

Delete OCR from PDF

I have PDF file containing corrupted OCR. It is a bunch of handwritten pages with a lot of symbols and abbreviations, and I got this file with an automatically generated OCR. How can I remove the ...

CommunityBot

modified Nov 24, 2023 at 16:26

0 votes

0 answers

92 views

Making badly scanned public domain books legible with OCR

I've obtained soft copies of some very old public domain books. The illustrations are clear enough, but the text is somewhat blurry. I've experimented with Tesseract OCR and it can recognize a ...

YQ002lc2

asked Jul 24, 2023 at 6:11

2 votes

0 answers

41 views

OCR high res images & combine OCR data later, after image compression?

I have a large number of .tif's coming out of ScanTailor. Is there a way that I might OCR those .tif's with tesseract, holding the OCR data separate from the images; then compress the images, and ...

Diagon

asked Jul 7, 2023 at 22:50

2 votes

0 answers

222 views

MacOS-like OCR for Linux?

How can one setup the same ubiquitous OCR capabilities on Linux, in a manner similar to how one can copy text from any image in any software on MacOS and iOS? I am using EndevourOS with Gnome DE.

RonJohn

1,144

modified Apr 14, 2023 at 8:53

0 votes

3 answers

1k views

OCR software for handwritten equations to get LaTeX file

First of all, I apologize if this is not the right place to ask this, but I couldn't think of anywhere else (maybe Stack Overflow?). Anyway, I'm looking for a Optical Character Recognition software (...

Lakshay Rohila

modified Jan 11, 2023 at 7:33

0 votes

1 answer

145 views

Make (`ocrmypdf`) command run in terminal AND include input name in that of the output

I have this line inside a Dolphin service-menu file that contains many other commands for PDF processing: Exec=bash -c 'f="%u"; ocrmypdf "$f" "${f%.pdf}_ocr.pdf";' It ...

cipricus

1,629

answered Dec 1, 2022 at 13:08

0 votes

1 answer

408 views

Best command-line OCR software for recognizing typed text over colorful background

I need to extract text from images like the one below: As you can see, the text is typed not handwritten. Moreover, the background is colorful. I've tried Tesseract OCR, and while it works some of ...

Marcus Müller

32.7k

answered Nov 15, 2022 at 21:26

0 votes

0 answers

47 views

How do I format texts that were processed by OCR?

Let's say that I want to connect all the paragraphs that are broken by the citations that start with (1), (2), (3), (4), (5). How would I express/automate this in bash? Keep in mind there are at most ...

Jean

modified Oct 1, 2022 at 16:59

1 vote

0 answers

34 views

Can I transform colors of scanned pdf files and reduce the scan resolution to save memory keeping an existing text layer from OCR?

I have a pile of pdf files which have been scanned long ago and which are already searchable (i.e. they went through OCR). However the light level and contrast settings were not optimal. Is it ...

Adalbert Hanßen

asked Sep 14, 2022 at 19:19

94 votes

4 answers

71k views

How to OCR a PDF file and get the text stored within the PDF?

First, apologies if this has been asked before - I searched for a while through the existing posts, but could not find support. I am interested in a solution for Fedora to OCR a multipage non-...

ingli

1,889

modified Jun 18, 2022 at 11:13

0 votes

0 answers

91 views

NormCap OCR via Awesome Window Manager

One of the coolest programs I've come across recently, is an Optical Character Recognition (OCR) program called NormCap. I have it tied to a hot key, and anytime I want to copy un-highlightable text ...

Lonnie Best

5,185

modified Jan 25, 2022 at 19:24

15 30 50 per page

2 3 Next

Stack Exchange Network

Questions tagged [ocr]

What happened to Tesseract's "Math / equation detection module"?

Is there some sort of PDF-to-text converter?

How can I rasterize all of the text in a PDF?

How to scan with ocr bash script

Delete OCR from PDF

Making badly scanned public domain books legible with OCR

OCR high res images & combine OCR data later, after image compression?

MacOS-like OCR for Linux?

OCR software for handwritten equations to get LaTeX file

Make (`ocrmypdf`) command run in terminal AND include input name in that of the output

Best command-line OCR software for recognizing typed text over colorful background

How do I format texts that were processed by OCR?

Can I transform colors of scanned pdf files and reduce the scan resolution to save memory keeping an existing text layer from OCR?

How to OCR a PDF file and get the text stored within the PDF?

NormCap OCR via Awesome Window Manager

Hot Network Questions

Questions tagged [ocr]

Related Tags