Skip to main content

All Questions

Tagged with
2 votes
0 answers
41 views

OCR high res images & combine OCR data later, after image compression?

I have a large number of .tif's coming out of ScanTailor. Is there a way that I might OCR those .tif's with tesseract, holding the OCR data separate from the images; then compress the images, and ...
Diagon's user avatar
  • 680
1 vote
0 answers
34 views

Can I transform colors of scanned pdf files and reduce the scan resolution to save memory keeping an existing text layer from OCR?

I have a pile of pdf files which have been scanned long ago and which are already searchable (i.e. they went through OCR). However the light level and contrast settings were not optimal. Is it ...
Adalbert Hanßen's user avatar
2 votes
0 answers
96 views

Is there software to manually OCR / teach OCR for handwriting (non-english) texts?

I had a problem that can't solve Tesseract/Abbyy Finereader etc - they can't recognize handwriting Russian as example. So I search OCR software for such things or a way to manually OCR my pdfs (...
PDD's user avatar
  • 21
5 votes
1 answer
274 views

Find PDFs that don't have text

I have many folders with lots of PDFs and I want to Optical Character Recognise those that do not have a text layer. So first, I want to find them. I thought that maybe a pipe with pdfgrep would do ...
fich's user avatar
  • 330
0 votes
1 answer
1k views

methods of PDF compression

The Problem I have a lot of old books that I want to scan and digitize. For this, I use some flatbed scanner, xsane and GImageReader, which works great. Back a few years ago, when I was still using ...
carsten's user avatar
  • 355
0 votes
1 answer
146 views

How to find a word in picture and put another word in desired position?

I am an IT specialist but i am doing financial clerk job a lot! I have to put cost centers in invoices (of the IT department) - by hand! Maybe is there in Linux a technology or solution to automate ...
Юля's user avatar
  • 1
2 votes
0 answers
264 views

Convert scanned pdf to pdf with text and images

Is it possible to convert scanned pdf to an normal pdf (i.e. same pdf as if it was created from a document (with formatted text and images)) ? I tried many OCR solutions online/offline but they tend ...
Jean Molinier's user avatar
4 votes
0 answers
191 views

Replace Scanned Text with OCRed Text in PDF

I have a scanned book as a PDF. When viewed in Evince, the book appears as it did when scanned, with old fashioned fonts that appear as they were scanned. However, Evince recognises the letters as ...
zhanmusi's user avatar
  • 141
4 votes
1 answer
2k views

Delete OCR from PDF

I have PDF file containing corrupted OCR. It is a bunch of handwritten pages with a lot of symbols and abbreviations, and I got this file with an automatically generated OCR. How can I remove the ...
Seninha's user avatar
  • 1,045
5 votes
1 answer
2k views

tesseract: is it possible to change font output in OCRed pdf?

Following up on how to OCR a pdf file and get the text stored within pdf? I have successfully produced OCRed pdf pages. In Evince, however, the letters are not shown; by this I mean that I cannot see ...
ingli's user avatar
  • 1,889
94 votes
4 answers
71k views

How to OCR a PDF file and get the text stored within the PDF?

First, apologies if this has been asked before - I searched for a while through the existing posts, but could not find support. I am interested in a solution for Fedora to OCR a multipage non-...
ingli's user avatar
  • 1,889
8 votes
4 answers
5k views

How can I rasterize all of the text in a PDF?

You know when you have a pdf, which is a scan of a document and it's a really huge file, because it just stores the picture of the scanned document? And there are OCR tools which can help you to ...
Dimitri Schachmann's user avatar
49 votes
6 answers
35k views

Is there some sort of PDF-to-text converter?

I need PDF files in text so I can search over them in bulk from commandline. Is there some converter for Ubuntu, OBSD or similar distro? Perhaps related post, OCR with Ubuntu here.
otto's user avatar
  • 591