10

I'd like to save an exact replica of a webpage in vector-graphics form, so I cannot use a screenshot technique (since that stores the image in a rastor-graphics form).

I've tried 'print to pdf' and 'save as pdf' through Safari, Chrome, and Firefox. This works most of the time. However, the pdf saved is not an exact replica for all webpages. For example, try saving this webpage as a pdf, and note how the upvote/downvote icons are not included in the saved pdf.

I've also tried saving as a WebArchive with Safari. Problem here is that I need to crop the resulting file, and I do not know how to crop a WebArchive, since Preview cannot open it, and it simply opens up in Safari (back to square one).

I've also tried web browser plugins that provide a one-click solution to save the webpage as pdf (vector-graphics form). This works better (exact page is saved) and almost solves the problem, except that these programs work by sending the page url to a cloud-based program to query and then save the page. This means that this technique will not work for https sites that need my credentials to login.

So I'm in a corner. I'm trying to save an exact vector-graphics replica of a webpage that needs my login credentials to view. How can I do this?

4
  • 1
    I think you are confused -- PDF is not a vector format. Commented Oct 2, 2013 at 17:59
  • Not confused; just not worrying too much about the detail that a pdf is a container that can store vector-graphics stuff, since I think the main point in the question is being conveyed Commented Oct 2, 2013 at 18:11
  • Your question seems to be "How can I save a web page as a PDF file, exactly as it shows on screen, and works with a page that requires a password to log on?" Commented Oct 2, 2013 at 18:19
  • Doesn't have to be pdf; that's just one route to save a webpage where the text is in vector-graphics form. I don't have to commit to that format. I'll add secure to the title for the second note though, so that this is emphasized better. Commented Oct 2, 2013 at 18:21

2 Answers 2

6

You are getting different results printing the page to PDF than you see when viewing the page on screen.

This happens because the web page includes a CSS stylesheet which changes the page when it is being printed.

This question will help you avoid that problem: How do I print with the screen stylesheet?

Follow the instructions there to print the page with the on-screen stylesheet.

Then you should be able to print to PDF and get the same result as you see on screen.

1
  • 1
    Just for full documentation, I ended up using the Chrome Web Developer plugin, and editing the css through this plugin. I could not get the print page to update after editing the css using Google Chrome's builtin Developer Tools, but this is most likely because I'm unfamiliar with that tool. Commented Oct 2, 2013 at 19:25
4

If you're not afraid of a little scripting you can try using the phantomjs application for OSX from http://phantomjs.org/

Then you would just run the included binary using the rasterize.js script with a command like:

phantomjs.exe rasterize.js http://www.example.com/sitepage 8.5in*11in outfile.pdf

A couple notes:

  • It's called 'rasterize.js' but the text itself is saved into the PDF as actual text.

  • Authentication to a secure-site using windows authentication can be accomplished by adding a couple lines to the rasterize.js script after initializing the page object:

var page = require('webpage').create(),
    system = require('system'),
    address, output, size;
    page.settings.userName="serviceUserName"; // I added these
    page.settings.password="servicePassword"; // 2 lines here

if (system.args.length  5) {
1
  • phantomjs rocks! Commented Oct 4, 2013 at 1:21

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .