20

So there are some threads here on PDF compression saying that there is some, but not a lot of, gain in compressing PDFs as PDFs are already compressed.

My question is: Is this true for all PDFs including older version of the format?

Also I'm sure its possible for someone (an idiot maybe) to place bitmaps into the PDF rather than JPEG etc. Our company has a lot of PDFs in its DBs (some older formats maybe). We are considering using gzip to compress during transmission but don't know if its worth the hassle

2 Answers 2

19

PDFs in general use internal compression for the objects they contain. But this compression is by no means compulsory according to the file format specifications. All (or some) objects may appear completely uncompressed, and they would still make a valid PDF.

There are commandline tools out there which are able to decompress most (if not all) of the internal object streams (even of the most modern versions of PDFs) -- and the new, uncompressed version of the file will render exactly the same on screen or on paper (if printed).

So to answer your question: No, you cannot assume that a gzip compression is adding only hassle and no benefit. You have to test it with a representative sample set of your files. Just gzip them and take note of the time used and of the space saved.

It also depends on the type of PDF producing software which was used...

4
  • But is the text content compressed? And what about embedded fonts?
    – Stewart
    Commented Jun 6, 2018 at 10:30
  • @Stewart: Embedded fonts usually are compressed (because font files themselves by default are also compressed). See also answer to "[How can I extract embedded fonts from a PDF as valid font files? ](stackoverflow.com/a/3489099/359307)". Text content usually is embedded just as other content and may or may not be compressed, just as the answer describes... Commented Jun 6, 2018 at 17:09
  • 1
    @KurtPfeifle Are you saying that sections of text in a PDF are "objects", just like images and such are? This isn't clear to people unfamilar with how the format works behind the scenes.
    – Stewart
    Commented Jul 23, 2018 at 12:29
  • @Steward: Yes. :-) Commented Jul 24, 2018 at 21:21
6

Instead of applying gzip compression, you would get much better gain by using PDF utilities to apply compression to the contents within the format as well as remove things like unneeded embedded fonts. Such utilities can downsample images and apply the proper image compression, which would be far more effective than gzip. JBIG2 can be applied to bilevel images and is remarkably effective, and JPEG can be applied to natural images with the quality level selected to suit your needs. In Acrobat Pro, you can use Advanced -> PDF Optimizer to see where space is used and selectively attack those consumers. There is also a generic Document -> Reduce File Size to automatically apply these reductions.

Update:

Ika's answer has a link to a PDF optimization utility that can be used from Java. You can look at their sample Java code there. That code lists exactly the things I mentioned:

  • Remove duplicated fonts, images, ICC profiles, and any other data stream.
  • Optionally convert high-quality or print-ready PDF files to small, efficient and web-ready PDF.
  • Optionally down-sample large images to a given resolution.
  • Optionally compress or recompress PDF images using JBIG2 and JPEG2000 compression formats.
  • Compress uncompressed streams and remove unused PDF objects.
3
  • I am unfamiliar with PDF utilities. Is there a JAVA API for this as whatever solution we use it would have to have a an API so that we can automate the process on our servers? I am aware of Apache PDFbox but not sure how good it is for compression of an already built PDF Commented May 13, 2012 at 18:19
  • Wanted to understand issues with usage of JPEG2000 in PDF. This option is not commonly used. Are there any rendering issues on some devices Commented Feb 8, 2019 at 9:21
  • PDF 1.5, which included JPEG2000, was introduced April 2003. So long as your reader supports at least PDF 1.5, it will work.
    – Mark Adler
    Commented Feb 8, 2019 at 15:31

Not the answer you're looking for? Browse other questions tagged or ask your own question.