Questions tagged [pdftotext]
Pdftotext converts Portable Document Format (PDF) files to plain text.
pdftotext
370
questions
0
votes
0
answers
35
views
How to use text extraction strategy
I am stuck in itext7 custom strategy. My goal is to extract data from a PDF to a text file without losing the table format. My PDF has a different table structure, some table columns are horizontal ...
0
votes
1
answer
586
views
pdfjs-dist importing module error despite rest of project importing appropriately
I am trying to introduce the pdfjs-dist library into my nodejs server. However it's giving an import error
Error [ERR_REQUIRE_ESM]: require() of ES Module C:\Users\zjric\auto-filing\node_modules\...
0
votes
0
answers
20
views
Reading Fillable PDF in Laravel
I am trying to create a fillable form in the form of a PDF which will contain form data. The PDF file will be uploaded into a form on the website and the data will be read later.
I looked for several ...
0
votes
1
answer
267
views
Extract text from pdf in correct visual order from PDF
while using a Python library to extract text from a PDF, the order of the selected text doesn't match what you visually see on the screen? For instance, when i copy some text at top of page, then a ...
0
votes
0
answers
232
views
How to handle merged cell table using pdfplumber
I am trying to parse pdf (including tables) and convert to json.
as of now, i am able to convert a table if it has atleast one row and 2 columns.
but i am struggling to parse a table to json properly ...
0
votes
0
answers
246
views
How to extract text from pdf with complex layouts using python?
I am extracting text from pdf but it's hard to extract for the complex layouts like a 2-column pdf and different scenarios of pdf's in a table like table with borders or no borders, and combined ...
1
vote
0
answers
90
views
Gscript PDF to text by OCR, problem with some characters
I've been using the function attached below for over a year and it worked perfectly. However, 2 days ago, something changed and it stopped converting Polish characters in multiple installations. I ...
2
votes
1
answer
1k
views
How can I extract text from a PDF document in a Flutter app?
I am working on a Flutter application and need to extract text from PDF documents. I have attempted to use the pdf package, but I'm unable to do so as I can see only PdfDocumentParserBase which is an ...
0
votes
1
answer
341
views
How to get the specific coordinates of each contents in PDF file?
I use Smalot\PdfParser for extract contents from PDF. As a beginner, I try to mess around with basic functions like getText(), getDetails(), getPages() .etc then I notice this return from $data = dd($...
0
votes
0
answers
28
views
Http request convertio api
Converting a PDF to txt using the convertio API and if after I send the post I send the get directly, the conversion hasn't finished and I get an error
I've worked myself out with a delay of 5 seconds ...
0
votes
0
answers
33
views
Python: Unable to extract multi-line 'Property Address' from PDF
Need your help to write a python script to extract multi-line text from a pdf file MultiLineText. Here's the codelet I tried to use: 'Address': r'Property No: (\d+)'
No matter what combination of ...
0
votes
0
answers
46
views
pdf to text reading from different file
I have many question and solution files in pdf format.
For each file there corresponds a questions-solution file pair.
I am trying to prepare a dataset to practice questions and solutions.
But to my ...
0
votes
2
answers
332
views
How to convert 2 column pdf data text to single column
I have pdf text data which is read using pdftotext in python.
How can I convert this data into correct sequence data text so that I can extract the text from string sequentially.
I want to convert ...
0
votes
0
answers
60
views
Cannot convert Hebrew characters using pdftotext
I have a PDF file that I can see and open, and send to every one:
Now I want to convert it to text. I am using Linux so I use these 3 commands:
pdftotext -enc ISO-8859-8 -layout barIlan.pdf bar.txt
...
0
votes
0
answers
143
views
Having trouble installing pdftotext on Windows
I am trying to install pdftotext on windows via pip install pdftotext. I am getting the following error:
pdftotext.cpp(3): fatal error C1083: Cannot open include file: 'poppler/cpp/poppler-document.h'...