1

A while back, I made a video that shows how to whiteout content in a PDF file using Foxit Reader.

After whiting out content, I advised that you could prevent "a recipient of that PDF" from undoing your whiteouts (using some advanced PDF editor) by simply printing the document to a new PDF, using the "Microsoft Print to PDF" virtual printer.

In other "Print to PDF" virtual printers, that I've used in the past, the file produced is one where all the text from the source document gets converted to an image, and the only thing contained in the new PDF (generated), is an embedded image of how the document looks after the whiteout modifications. In this case, it is impossible for the recipient to undo the whiteouts, because the PDF doesn't even contain the content that was underneath the whiteout modifications.

In the latest Foxit Reader, the steps to whiteout content have changed. And, I'd like to make an updated video. However, while testing, I noticed that the PDF, produced by "Microsoft Print to PDF", has text that I can highlight, copy, and paste from the PDF. Since the PDF is not just an embedded image (without text), now I'm not so sure that the whiteouts cannot be undone by an advanced PDF editor. This may be very important to some people, so I want to ensure I'm correct about this advise.

I don't have an advanced PDF editor to confirm this. So, I'm hoping that someone (reading this) has the knowledge (or resources) to confirm definitively whether or not this technique is indeed a reliable way to ensure that private edits to a PDF document cannot be revealed by any advanced techniques.

11
  • 1
    Printing to PDF as an image is a very old structure which completely kills the whole idea of it being a cross-platform vector format, capable of being compressed very small without ever losing sharpness. I would have thought even MS would have improved that by now. This does, of course, make your idea that drawing boxes over things will ever truly hide them. The data still exists & can be read/copied.
    – Tetsujin
    Commented Sep 15, 2022 at 16:02
  • 1
    Compare these 2 quick redaction techniques - the purple square is 'just a box' & can easily be seen under. The black with X's isn't. It is a true & permanent redaction. No data remains even at forensic level. Of course, this is a picture to be able to upload here, so you can't properly test it - i.sstatic.net/foyB8.png
    – Tetsujin
    Commented Sep 15, 2022 at 16:07
  • 1
    I'd be tempted to ask in softwarerecs.stackexchange.com [as it's off topic here] if there is a Windows equivalent of that. This is a native function in macOS, no 3rd party software required.
    – Tetsujin
    Commented Sep 15, 2022 at 16:09
  • 1
    If you convert it to an image, sure - but that really is ruining the entire idea of having a pdf in the first place. You may as well send out a jpg. Find something that can redact properly [& don't believe the hype, make sure it can do it properly] There are a myriad options on google if you have a look, but I've no clue which to trust other than the obvious Adobe [or Apple, of course].
    – Tetsujin
    Commented Sep 15, 2022 at 17:53
  • 1
    …and that's what you need software recs for, unless you want to buy a Mac ;)) Otherwise, just screenshot it. Same end result, a picture of some text with holes in.
    – Tetsujin
    Commented Sep 15, 2022 at 18:36

3 Answers 3

1

If you are using Proprietary Tools, you have no guarantee what the tools will be doing to "optimize" the workflow & what Meta Data gets retained in conversions.

Eg 1 : If there are layers of text below the layers of Images (to "assist" text-2-speech users) then whiteout may be undone.

Eg 2 : If the tools want to include text to enable "text search" of the Images , then the text may be there in some Meta Data like Comments or Annotations.

Eg 3 : Certain tools store revision history (to help "undo & to "audit") & these may leak the unwanted text.

Eg 4 : Some tools generate Caches & Indexes (to help users with quick output) which may reveal unwanted text.

The Best way (with a guarantee that the content is "gone") would be :
(1) Add the white-out (either by Placing Square on top layer or by Blackening the text)
(2) Convert the whiteout Pages to Images using some tools (Paranoid users might want to (2A) Check the generated Images & (2B) Eliminate all unwanted Meta Data , especially Comments & Annotations)
(3) Stitching these Images into a set of Pages to share & view.

UPDATE (to include Supporting Material) :
Since somebody thought that there is something wrong in this Answer.

The Almost Same Process is listed in this Document Page 11 :

...
the process of redacting in PDF involves:
•covering [[ Whiteout ]] each item of confidential information
with a black rectangle or by using black text highlighting
•converting [[ Image generation ]] the PDF document to multiple TIFF image files
•converting and reassembling [[ Stitching ]] the files into a single PDF document.
...
NOTE: Converting to TIFF and back to PDF has the unfortunate consequence that
the file will no longer be searchable, and accessibility is lost
because document structure and tags are lost in the process.
Using a third-party redaction plug-in would avoid such problems.
...
IMPORTANT: At this point, all you have done is to
cover up the confidential information.
To remove the information,
you need to “flatten” the file by converting to TIFF images
...

I have added some text & high-lighting :
The 3 Stages match the Process I outlined earlier.
The Issues mentioned are also what I listed earlier , that is searchability & accessibility getting lost.
While it says to use third-Party tools to avoid such Problems, I think using such third-Party tools would mean including the Sensitive Content in the Meta Data where some user can & will look to Extract. I think these should be avoided & "Images-only" will the safest way even with loss of searchability & accessibility.

The Pitfalls of "assuming redaction through software" are listed here :

(1) It lists Items related to (1A) Hillary Clinton / Sidney Blumenthal/ Libya , (1B) Apple / U.S. District Court , (1C) Citigroup / Social Security numbers & (1D) Paul Manafort / Russia.

(2) It also says these "obvious" methods are not effective or foolproof :

(2A) Changing the text's color to white. This may make it look as though the selected words to be redacted are hidden, but the remaining metadata can reveal the hidden text.
(2B) Blacking out with comment tools: Edits made by such tools can be removed to reveal the underlying text.
(2C) Deleting words or sections: Metadata contains document revision history and can be used to view deleted information.
(2D) Using dark tape or opaque marker: Rather than physically clipping out sensitive information, it is common practice to cover such information with dark tape or a marker and scan it into a PDF format. However, many scanners are sensitive enough to view such covered words even if they do not appear to be visible.

Putting all these together, I think the Process listed in this Answer is quite effective & foolproof !

3
  • 2
    Two anecdotes to share about whiteouts : [[ (1) I once came across a Document having alternate Blank Pages, but was later told that the Blank Pages had white text on white background ! I had to "Select All" to see it ! ]] [[ (2) I was once sent a Corporate Document which Prohibited Printing-Copying-forwarding ... When I selected a Paragraph & clicked "search" , the "search text box" was filled with the "uncopyable" text, but I could copy it from the "search text box" ... ]] What I want to convey with these anecdotes : whiteouts are not a guarantee that users can not see the original text.
    – Prem
    Commented Sep 15, 2022 at 20:21
  • 1
    This is why such as Apple & Adobe have made a specific redaction tool. It ensures all of these 'back doors' are closed. [I think MS Word can also do this now, but I can't test it.]
    – Tetsujin
    Commented Sep 16, 2022 at 11:19
  • 2
    I have commented on the other answer that these Proprietary tools have a Disclaimer to "manually check/verify before sharing" because the automatic ways may not work in every case. The onus is on the users with these tools. With the Image Conversion Process, we have a "guarantee" that the redaction (upto our own requirements) is Correct. Most automation tools in this Scenario have "bugs" & "loop holes" & "other corner cases" but I can see no such redaction failure Issue in Image Conversion Process , except the file size Issue which is mostly offset by the cost , @Tetsujin
    – Prem
    Commented Sep 16, 2022 at 14:32
1

For a reliable redacting, I am not aware of any free tool. So, I have some doubts (nothing personal) that the method you show is really reliable enough.

Acrobat Pro, and if I remember correctly, Foxit (Pro) have redaction tools, which really remove the content plus any associated structure information. And then, there is the long-time industry standard, Redax by Appligent. Use such tools. It may be way more than worth it…

2
  • 1
    Casual users need not buy software when the Simple Process (listed in my Answer) can achieve the same & yet be better than the Proprietary Process. Even though Acrobat Pro & Appligent are good, there are still failures listed here [[ talkingpdf.org/redacting-with-acrobat-8-professional-vs-redax ]] where it claims that (A) Acrobat may not remove "custom Meta Data" (B) Appligent shows "transparent zones" where the onus is on the user to check/verify that redaction is correct. In Image Conversion Process, verification is "automatic" & we have a "guarantee" that redaction is correct !
    – Prem
    Commented Sep 16, 2022 at 10:24
  • 1
    Checking more into Appligent , redax.appligent.com/redax/troubleshooting-support lists some known cases [[ (A) "Some things are not redacted" : Check Inline Character Images (B) "Sensitive information is still displayed" : Before releasing a document, be sure to remove all annotations. ]] In short onus is on the user to check/verify. In "Image Conversion Process", we achieve the same without Proprietary tools, suitable for casual usage for free.
    – Prem
    Commented Sep 16, 2022 at 10:32
1

I support SumatraPDF which gets complaints that, the PDF print as image is too memory intensive and results cannot be searched for metadata !!

Here before print
enter image description here

After Save of a "Whiteout" the text is usually still selectable searchable and recoverable, plus the whiteout itself is also leaking more details as when, (possibly where) and likely by whom.

enter image description here

Once printed as image then only the page is selectable and almost nothing other than when reprinted remains. Of course you then need to OCR the page for Disabled Viewers.

enter image description here

There are ways to simply place REDACTION annotations IMPORTANT YOU CHECK, and then re stamp remaining selectable contents into an empty/blank New container PDF (But that's not a PDF Readers function.)

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .