2

I have a number of PDFs of scientific papers that I want to be able to read on a Kindle. They look fine on my laptop but on the Kindle they look like low quality scans and are awful to read. As far as I can tell the problem is that the PDF uses a bitmap font: the text is selectable and searchable, and it looks completely normal from a distance, but when I zoom in it's very pixellated (and the same letters have the exact same pixels).

I'm on a Mac and this isn't worth buying any software for. Is there anything I can do to change the font? I can write a bit of code or use the terminal if necessary.

2
  • 1
    Despite your "the same letters have the exact same pixels" I still don't think you're seeing a bitmap font, but are looking at an image. Sounds like a scan of a printed paper. Smart scan software creates an invisible layer of OCR'd text that allows for searching and selecting text, but then you're still seeing the scanned image. But I might be wrong, of course.
    – Arjan
    Commented Jul 26, 2015 at 19:44
  • @Arjan can I make use of this to make a readable paper? If I just copy paste into plain text the spacing and ordering is obviously all messed up, but the PDF knows where each character is so surely there's a way to use that information?
    – alexmojaki
    Commented Jul 27, 2015 at 21:38

2 Answers 2

0

This is very late, but since I was googling the same thing today I thought it might be useful to add some links. There are indeed programs which try to replace bitmap fonts in PS or PDF files (typically generated with old version of Latex) with outline / vector fonts.

If I get any of these working I will add what I learned...

-1

If you're using Adobe Acrobat (ver. 9 or newer), you can try using another type of OCR called Clearscan. The default OCR used in apps like acrobat is not very good, which Clearscan tries to improve as well as reducing the filesize.

Here is a guide for Acrobat 9, quoted from The Acrolaw Blog.

ClearScan OCR is not the default in Acrobat 9, so you’ll need to change a setting to use it. Here’s how.

Choose: Document < OCR Text Recognition > Recognize Text using OCR... Click the Edit... button in the OCR window:

Change the PDF Output Style to ClearScan.

Click OK twice to OCR the document.

Note: The setting is "sticky" for future sessions.

I didn't have a pdf with OCR-text handy, but for the newest version of Acrobat Pro DC the option is called 'Recognize text' (you can use the Tools pane to the right of the document to search for it - it's part of the Enhance Scans tool).

If you want to know more about Clearscan, the quoted blog post explains about it in detail.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .