Skip to main content
The 2024 Developer Survey results are live! See the results

Questions tagged [pdftotext]

Pdftotext converts Portable Document Format (PDF) files to plain text.

pdftotext
0 votes
0 answers
35 views

How to use text extraction strategy

I am stuck in itext7 custom strategy. My goal is to extract data from a PDF to a text file without losing the table format. My PDF has a different table structure, some table columns are horizontal ...
Ibad Ur Rehman's user avatar
0 votes
1 answer
586 views

pdfjs-dist importing module error despite rest of project importing appropriately

I am trying to introduce the pdfjs-dist library into my nodejs server. However it's giving an import error Error [ERR_REQUIRE_ESM]: require() of ES Module C:\Users\zjric\auto-filing\node_modules\...
Nebulous's user avatar
0 votes
0 answers
20 views

Reading Fillable PDF in Laravel

I am trying to create a fillable form in the form of a PDF which will contain form data. The PDF file will be uploaded into a form on the website and the data will be read later. I looked for several ...
Nabill Farhan's user avatar
0 votes
1 answer
267 views

Extract text from pdf in correct visual order from PDF

while using a Python library to extract text from a PDF, the order of the selected text doesn't match what you visually see on the screen? For instance, when i copy some text at top of page, then a ...
Phalgun's user avatar
0 votes
0 answers
232 views

How to handle merged cell table using pdfplumber

I am trying to parse pdf (including tables) and convert to json. as of now, i am able to convert a table if it has atleast one row and 2 columns. but i am struggling to parse a table to json properly ...
Santhosh's user avatar
0 votes
0 answers
246 views

How to extract text from pdf with complex layouts using python?

I am extracting text from pdf but it's hard to extract for the complex layouts like a 2-column pdf and different scenarios of pdf's in a table like table with borders or no borders, and combined ...
Phalgun's user avatar
1 vote
0 answers
90 views

Gscript PDF to text by OCR, problem with some characters

I've been using the function attached below for over a year and it worked perfectly. However, 2 days ago, something changed and it stopped converting Polish characters in multiple installations. I ...
Krzysztof B's user avatar
2 votes
1 answer
1k views

How can I extract text from a PDF document in a Flutter app?

I am working on a Flutter application and need to extract text from PDF documents. I have attempted to use the pdf package, but I'm unable to do so as I can see only PdfDocumentParserBase which is an ...
Sumanth's user avatar
  • 111
0 votes
1 answer
341 views

How to get the specific coordinates of each contents in PDF file?

I use Smalot\PdfParser for extract contents from PDF. As a beginner, I try to mess around with basic functions like getText(), getDetails(), getPages() .etc then I notice this return from $data = dd($...
Keith Lê's user avatar
0 votes
0 answers
28 views

Http request convertio api

Converting a PDF to txt using the convertio API and if after I send the post I send the get directly, the conversion hasn't finished and I get an error I've worked myself out with a delay of 5 seconds ...
Alejandro Patrick Viera McGorr's user avatar
0 votes
0 answers
33 views

Python: Unable to extract multi-line 'Property Address' from PDF

Need your help to write a python script to extract multi-line text from a pdf file MultiLineText. Here's the codelet I tried to use: 'Address': r'Property No: (\d+)' No matter what combination of ...
Kanjeero boocho's user avatar
0 votes
0 answers
46 views

pdf to text reading from different file

I have many question and solution files in pdf format. For each file there corresponds a questions-solution file pair. I am trying to prepare a dataset to practice questions and solutions. But to my ...
Granth's user avatar
  • 374
0 votes
2 answers
332 views

How to convert 2 column pdf data text to single column

I have pdf text data which is read using pdftotext in python. How can I convert this data into correct sequence data text so that I can extract the text from string sequentially. I want to convert ...
Granth's user avatar
  • 374
0 votes
0 answers
60 views

Cannot convert Hebrew characters using pdftotext

I have a PDF file that I can see and open, and send to every one: Now I want to convert it to text. I am using Linux so I use these 3 commands: pdftotext -enc ISO-8859-8 -layout barIlan.pdf bar.txt ...
Nadav Oxenberg's user avatar
0 votes
0 answers
143 views

Having trouble installing pdftotext on Windows

I am trying to install pdftotext on windows via pip install pdftotext. I am getting the following error: pdftotext.cpp(3): fatal error C1083: Cannot open include file: 'poppler/cpp/poppler-document.h'...
AngryHacker's user avatar
  • 61.1k

15 30 50 per page
1
2 3 4 5
25