1

I downloaded a PDF file that I can view, print, modify ... but can't do a simple text copy. From a brief search I could not find how this was achieved. I have tried with macOS preview app and chrome's pdf viewer. On the preview app I get ? and on chrome I get empty space

In the image below you can see 8 but when I try to copy it I get empty space or the ? which I think it's macOS way of saying it can't read that encoding ?

enter image description here

3
  • There are several similar questions on here already - superuser.com/search?q=pdf+copy+text+garbage
    – Tetsujin
    Commented Feb 11, 2023 at 7:25
  • Yes @Tetsujin I found some answers on how to get the text but none on how to generate this kind of pdf
    – enzo
    Commented Feb 11, 2023 at 17:32
  • I just found out about Ghostscript and this may be generated with it. bugs.ghostscript.com/show_bug.cgi?id=692450 I still haven't read their documentation. once I achieve this effect I will post an answer on how to
    – enzo
    Commented Feb 12, 2023 at 17:37

2 Answers 2

0

This is a restrictions (side-effect) by making a pdf from a .jpg.

There is no possible way to fix this, except extracting the text with OCR software. If it doesn`t work from the pdf directly, make a .jpg print of it as use that.

4
  • 1
    Is there a way to reproduce this "side effect"
    – enzo
    Commented Feb 9, 2023 at 19:50
  • They say they can modify it, so apparently it's not an image format. Images can be embedded in PDFs but then you'd not be able to modify it as though it were text. I'm honestly confused as to what's going on as well (not sure all information in the question is correct), but some more details on how this is jpg-in-pdf or why you think that's the answer would be helpful.
    – Luc
    Commented Feb 10, 2023 at 22:53
  • if it's easier I can provide the pdf file
    – enzo
    Commented Feb 11, 2023 at 16:51
  • Then it might be a combination of text and images, if you can edit some parts and not others. BUT with a dedicated pdf software even if you place a text document in image format, you can add extra editable fields over the image. In the end the pdf has no text encoded in it.
    – Unix
    Commented Feb 15, 2023 at 23:09
0

You can use this tool to convert PDF to text: pdf2text-ocr

Then you'll be able to copy, search, etc. Files are converted locally in the browser using OCR and are never uploaded to external servers. It's free and open source.

Disclosure: I'm the author of pdf2text-ocr. I created it to help a friend who had the same problem at work.

1
  • 1
    Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.
    – Community Bot
    Commented Jun 10, 2023 at 18:30

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .