3

How would one go about extracting images in their native/maximum resolution from a PDF file?

I have been trying different methods of exporting images/screenshots and end up with huge bloated files that are still lower detail than the PDF file.

2

1 Answer 1

5

You can extract images from a pdf file with pdfimages program that is a part of the Poppler library available as poppler-utils package on many Linux based OS. It is even available on Windows. See the man page for pdfimages for available options.

This command lists the images embedded in a PDF file:

pdfimages -list sample.pdf
page   num  type   width height color comp bpc  enc interp  object ID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------------
   1     0 image    4960  7008  gray    1   1  jbig2  no         6  0   600   600  132K 3.1%
   2     1 image    4960  7008  gray    1   1  jbig2  no        11  0   600   600 40.4K 1.0%
   3     2 image    4960  7008  gray    1   1  jbig2  no        15  0   600   600 26.3K 0.6%

To extract the images:

pdfimages -j -png sample.pdf sample

This will extract any JPEG images in original JPEG format and the others will default to lossless PNG format, so you'll end up with files like this:

sample-000.jpg
sample-001.jpg
sample-002.png
sample-003.png

In a rare case that you can’t install any software, you might want to try an online service like pdf extract tool.

2
  • So Poppler is up to version 23 now, but the Windows version is still at 0.68? Does that sound right?
    – Simon E.
    Commented Mar 12, 2023 at 4:57
  • Is there any way to determine where they appear in the text?
    – KevinHJ
    Commented Jun 3 at 12:26

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .