5

I'm trying to copy text from a PDF file, but I get garbage. I'm using Document Reader on Ubuntu to read the document. It's not like its not allowing me to copy, but just that the copied text looks like this:

RFRPSLOHJFFDUSVQLIIHUFRDUSVQLIIOSFDS    

5XQDVURRW

LQFOXGHSFDSK!
LQFOXGHVWGOLEK!
LQFOXGHVWULQJK!

$53+HDGHUDVVXPLQJ(WKHUQHW,3Y

GH¿QH$53B5(48(67
$535HTXHVW

GH¿QH$53B5(3/<
$535HSO\

W\SHGHIVWUXFWDUSKGU^
XBLQWBWKW\SH
+DUGZDUH7\SH

XBLQWBWSW\SH
3URWRFRO7\SH

XBFKDUKOHQ
+DUGZDUH$GGUHVV/HQJWK

XBFKDUSOHQ
3URWRFRO$GGUHVV/HQJWK

XBLQWBWRSHU
2SHUDWLRQ&RGH

XBFKDUVKD>@
6HQGHUKDUGZDUHDGGUHVV

XBFKDUVSD>@
6HQGHU,3DGGUHVV

XBFKDUWKD>@
7DUJHWKDUGZDUHDGGUHVV

XBFKDUWSD>@
7DUJHW,3DGGUHVV

What can I do to fix this? its a large amount of data will take a really long time to type.

Also, incidentally, the pasted text looked like this on gedit (Ubuntu):

on my system (notice that it looks different when pasted here in this question!)

I sense it is somehow an encoding problem, but I have no way of knowing how to fix this.

2
  • 3
    I think it's on purpose; the person who created the document purposefully made it so you couldn't copy/paste or export. I've had a few PDF's like this too. Mostly tables or excel spreadsheets made into PDFs by Adobe software.
    – skub
    Commented Jan 28, 2012 at 22:23
  • @slhck sure! here it is
    – Chani
    Commented Jan 28, 2012 at 22:26

1 Answer 1

2

The underlying text is garbled. I think @skub is correct to think that it may be on purpose. One way to get the text would be to export each page as an image (e.g. .jpg or .png) and then scan the images with OCR software. I was able to test this on Windows 7 with Adobe Acrobat X; it worked.

Update:

If your document viewer has a similar feature, copy with formatting copies the text as expected. Digging deeper, I can confirm that the embedded fonts all have a custom encoding.

1

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .