1

If I copy some text from a PDF, the text appears correct, however the text editor considers the text to be one long sequence.

How the line appears in notepad:

notepad

The only way I have been able to visually see that there is a problem with the text is by copying the text into vi, through Cmder:

vi

The text appears as follows inside a hex editor:

HxD

I have tried using Puretext to strip out the invisible character on paste, but that doesn't work:

puretext

Trying to copy and paste the character into the replace dialog of an editor and replacing it with a space yields no results.

The only way I have found that works is to manually delete each "space" and replace it with an actual space.

What is the recommended way to easily remove these invisible characters on paste, or using search and replace?

6
  • 2
    have you tried copy pasting it to excel and use SUBSTITUTE function? Or maybe use the Paste as Plain Text in Chrome? It's also will be hard to reproduce and test possible solution without you providing an example pdf
    – Vylix
    Commented Nov 7, 2018 at 11:18
  • 1
    @Vylix Your answer gave me the idea to use Chrome as the PDF viewer instead of my current viewer, which is SumatraPDF. That worked! Using Chrome as the PDF viewer doesn't introduce the problem characters, hence there is no longer a problem to solve.
    – Dev Step
    Commented Nov 7, 2018 at 11:34
  • glad to be a help. Can you write that as an answer?
    – Vylix
    Commented Nov 7, 2018 at 11:41
  • A0 would be LF. So for whatever reason SumatraPDF copies spaces as line feeds (in this case). Notepad doesn't handle LF correctly as the expected expected value for a line break is CRLF on Windows. Though I believe one of the latest Windows 10 builds should have a patch for Notepad that makes it respect Unix style line breaks as well.
    – Seth
    Commented Nov 7, 2018 at 11:47
  • @Seth CRLF is 0D 0A though, not A0. A0 appears to be a non-breaking space.
    – Dev Step
    Commented Nov 7, 2018 at 12:02

1 Answer 1

0

Here is the simple solution:

The PDF viewer I use is SumatraPDF. If I use Chrome as a PDF viewer, it doesn't introduce the non-breaking space into the copied text.

The Chrome PDF viewer inserts the correct space into the copied text.

By changing the PDF viewer used for these particular PDFs, the problem is solved.

I have tested this with various PDFs and the problem is with these particular PDFs only.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .