Skip to main content

Questions tagged [tesseract]

Tesseract is an OCR (optical character recognition) engine

0 votes
0 answers
52 views

What happened to Tesseract's "Math / equation detection module"?

I was able to get Tesseract to run via a Python script on my Windows machine to turn non-searchable PDFs into searchable ones. When downloading Tesseract onto windows, it asked me which languages I ...
Curious Layman's user avatar
2 votes
0 answers
41 views

OCR high res images & combine OCR data later, after image compression?

I have a large number of .tif's coming out of ScanTailor. Is there a way that I might OCR those .tif's with tesseract, holding the OCR data separate from the images; then compress the images, and ...
Diagon's user avatar
  • 680
0 votes
1 answer
408 views

Best command-line OCR software for recognizing typed text over colorful background

I need to extract text from images like the one below: As you can see, the text is typed not handwritten. Moreover, the background is colorful. I've tried Tesseract OCR, and while it works some of ...
user avatar
0 votes
1 answer
122 views

Tesseract doesn't accept process substitution

I'm making a quick script that is supposed to use OCR tool (tesseract) on image in clipboard to convert it to text and output it. It looks like this: #!/bin/sh temp="$(mktemp tmpXXX.png)" ...
Fedja's user avatar
  • 115
0 votes
1 answer
118 views

Scripting tesseract for file manager context menu

File manager context menu scripts sometimes do the job far quicker than using a GUI utility. So I've been using dozens of simple and more complex scripts for a long time in file managers Dolphin, ...
Sadi's user avatar
  • 505
1 vote
0 answers
395 views

Using tesseract for character recongniton, result is not as expected (much worse). How to get better?

I wanted to add output of Linux boot to my question and decided to try to use optical character recognition thinking now in 2022 surely there should be decent open source options (have not tried OCR ...
Martian2020's user avatar
  • 1,219
2 votes
0 answers
96 views

Is there software to manually OCR / teach OCR for handwriting (non-english) texts?

I had a problem that can't solve Tesseract/Abbyy Finereader etc - they can't recognize handwriting Russian as example. So I search OCR software for such things or a way to manually OCR my pdfs (...
PDD's user avatar
  • 21
0 votes
1 answer
202 views

How do you save the text in the terminal to various text formats?

I'm playing around a bit with OCR software, in particular I'm spending a bit of time with tesseract. I got it to where I can load an image and get tesseract to rip the text from the image, in Linux ...
Neil Meyer's user avatar
0 votes
1 answer
1k views

Install tesseract offline in RHEL

I have an RHEL based server that does not connect to the internet. I need to install Tesseract >4.0 on this server. Therefore, my option was to download RPM packages from another and move them to ...
Sathindu's user avatar
  • 101
3 votes
0 answers
317 views

Debian Buster: Tesseract not supporting URL as argument

I'm trying to parse text from a hosted image, but it looks like I've miss-configured Tesseract. I'm using Debian Buster, tesseract-ocr, libtesseract-dev and a Ruby wrapper are installed. # $ ...
Sumak's user avatar
  • 263
1 vote
0 answers
48 views

script run via keyboard binding does not write to file

Following bash script interprets text in an image file and writes to a .txt file. #!/usr/bin/env bash LD_LIBRARY_PATH="/usr/local/lib" export LD_LIBRARY_PATH /usr/local/bin/tesseract /home/martin/...
MyrionSC2's user avatar
  • 111
10 votes
2 answers
13k views

Tesseract: High CPU Usage and slow speed, only when running multiple processes in parallel

Problem pytesseract.image_to_string() takes too much time when I run the script through supervisordd, but executes almost instantaneously when run directly in shell (on the same server and ...
Ashish's user avatar
  • 270
0 votes
1 answer
260 views

Leptonica compilation error

Trying to install leptonica v1.78 on Ubuntu 16, but it's not working for some reason. After running ./configure and make, I keep getting this error: make[2]: Entering directory '/home/user/Documents/...
Gyakenji's user avatar
  • 101
5 votes
1 answer
2k views

tesseract: is it possible to change font output in OCRed pdf?

Following up on how to OCR a pdf file and get the text stored within pdf? I have successfully produced OCRed pdf pages. In Evince, however, the letters are not shown; by this I mean that I cannot see ...
ingli's user avatar
  • 1,889
2 votes
1 answer
699 views

Where I can get Tesseract binaries for Debian 6 64bit?

I used apt-get to install Tesseract but it's not really working. Maybe I could just download binaries somewhere, put in a dir and use this way? What's wrong with my Tesseract now: tesseract --help ...
buikoto's user avatar
  • 21