How to convert a webpage to PDF with preserving its look (exactly as on web browser) and text/links?

Question

I'm looking for a way to convert a webpage to PDF, but preserving the webpage's look. Also preserving webpage's text (being selectable), searchable [Generating image screenshot for the webpage would make text neither selectable nor searchable].

I'm looking for printing the webpage to PDF as is (as on web browser) without any manipulation on style or alignment, or loss of any webpage's static components.

This would help preserving offline copies of webpages that are easily readable, annotateable and searchable.

You don't need to read any of below (Question is just the above section) in order to get my question. The following section is just listing of what I've got through research or others' answers in a nested way in order to reach an answer for the question.

Research Outcomes (Suggestions that didn't solve my problem)

Outcomes till now on trying to find a solution (All still not working as a solution for this question)

I've tried these PDF web printing engines but all manipulate pages' look, more even damaging and making some hardly readable: (Example page screenshots are included in square brackets)

Chrome [Original, Print Styles (Disabled | not Disabled)]
Firefox [Original, Print Styles (Disabled p1,p2 | not Disabled p1,p2)]
Readability
- It simplifies the webpage (which is a good thing for focused reading–However, this isn't what I'm looking for). I'm looking for keeping all the webpage's positions/styles properties as seen on Web Browser in a PDF format without any manipulation.
Foxit Reader
NovaPDF
CutyCapt [Original, Zoom Factor: 0.4: Screenshots, Outputted PDF]
- I'll add links after I solve program's running issues on Windows"
wkhtmltopdf [Original, Zoom Factor: 0.4: Screenshots, Outputted PDF]
- It doesn't support CSS3.

All webpage screenshot image capturing plugins (e.g. Abduction, Awesome Screenshot, Fireshot, Firefox Screenshot Developer Tool, Full Page Screen Capture, Page2Images, web-capture, ...) don't answer my question, because they don't preserve text and links.

Scrible is great at preserving webpages as is for further annotation and research, but unfortunately still online and without conversion to PDF format.

There are two other questions on the community similar somehow to mine, however, this one is different a little bit but with those important distinctions:

How to get WYSIWYP (print what you see) in a web browser?
- This question asks about a way to capture a webpage (as seen on screen) anyway even if it's an image and text won't be preserved. Whereas, I'm looking for capturing text and links also (importantly preserve text and links).

More Similar questions where preserving text and links isn't a requirement (pages are captured as image screenshots mostly):

How to Take Screenshots /Save a web page as PDF

Print From Browser Using Screen CSS?
- It asks about disabling print styles, which seems it doesn't help from the above screenshots.

Notes

OS: Windows 10

If you want to print from a browser you first have to disable any print stylesheets to maintain the web page's screen appearance. — DavidPostill, Commented Apr 12, 2016 at 15:25
See How to get WYSIWYP (print what you see) in a web browser?. See my answer to that question. — DavidPostill, Commented Apr 12, 2016 at 15:26
@DavidPostill It seems that disabling print styles either doesn't work or it doesn't effect the browser to display PDF correctly. An example screenshots have been added to the edited version of the question. — Omar, Commented Apr 12, 2016 at 19:11
Why did no one mention Opera? Opera save as pdf function will save it as exactly how it looks in a browser. — Alex, Commented Nov 29, 2020 at 9:24

nmhung1985 · Accepted Answer · 2019-07-27 11:34:15Z

11

Contributing another answer for possible users. In Firefox, there used to be an addon "Print pages to PDF". You can search for its last version 0.1.9.3 (work on pre-Quantum versions only).

Currently there's this addon for both Chrome and Firefox that works quite well: PDFMage

Save all images in page
Generate text as text, not as image, you can search text in generated PDF.
Preserver hyperlinks
Has the option to save a long webpage as a one-page PDF (so the images are not split between pages)

answered Jul 27, 2019 at 11:34

nmhung1985

1211 silver badge4 bronze badges

Excellent addon. Thank you.
– Dude named Ben
Commented Jun 18, 2021 at 6:18
I feel like this is the answer we're looking for. Gave it a go and it's preserved the look/layout of the site, text and links. Vote this up!
– Daniel
Commented Jun 3, 2022 at 6:56
This is the best answer! I've tested on some web pages and it does preserve the looking.
– czxttkl
Commented Sep 11, 2022 at 16:20

Add a comment |

sebisnow · Accepted Answer · 2020-05-12 11:41:14Z

11

We faced the same problem in a University project and were able to solve it using

wkhtmltopdf

We quite enjoyed the capabilities of this tool on the command line. We also called it using python code to render the current state of webpages. It has the option to deliver the webpage as pdf, usually not perfect to preserve the website view due to the Page formatting (A4 for example), or as png (preserves the view of the page but not links)

There is also the readability(for Python:pypi.python.org/pypi/readability-lxml) project we used that does the ads removal and content detection quite well (e.g. for newspaper articles and the like). If you just want an addon or extension for your browser the following readability implementation might satisfy your need:

Offline now: https://www.readability.com/addons/

WaybackMachine Link: https://web.archive.org/web/20160308192045/https://readability.com/addons

edited May 12, 2020 at 11:41

answered May 4, 2016 at 11:31

sebisnow

2181 silver badge6 bronze badges

Unfortunately, wkhtmltopdf didn't preserve page's elements positions. Example Page: Zoom Factor: 0.4: Screenshots, Outputted PDF
– Omar
Commented May 6, 2016 at 18:36
Readability simplifies the page (which is a good thing–However this isn't what I'm looking for). I need to keep all the page's positions/styles properties as seen on Web Browser in a PDF format without any manipulation.
– Omar
Commented May 6, 2016 at 19:08
Did you use the wkhtmltopng option of the tool, as png the positions should be okay (at least much better than in the pdf version where the page is fitted to A4 format)
– sebisnow
Commented May 9, 2016 at 6:36
@sebisnow Is the readability.com site deprecated? I can't access it at the moment.
– jeppoo1
Commented May 8, 2020 at 19:44
1

yes, seems to be offline for at least a year already. I will add a wayback machine link. web.archive.org/web/20160308192045/https://readability.com/…
– sebisnow
Commented May 12, 2020 at 11:36

| Show 2 more comments

AlanObject · Accepted Answer · 2019-05-26 14:04:39Z

I really struggled with this and tried most of the tools that are mentioned so far. The best results I got was using Chrome's headless mode. The command on MacOS would look like this:

/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --headless --print-to-pdf=test.pdf http://127.0.0.1:8080

The best list of command line options I found was here.

However there were problems with that. Specifically my pages are very javascript heavy and I couldn't make the print function wait for them to finish execution. So my output didn't have the images in it.

The solution I found was a nodeJS package: chrome-headless-render-pdf. It's scant documentation is here. It works and it is easily scriptable.

Headless chrome works but generates horrible output.
– Dude named Ben
Commented Jun 18, 2021 at 6:21 — Dude named Ben, Commented Jun 18, 2021 at 6:21

fixer1234 · Accepted Answer · 2019-07-27 19:58:56Z

I had the same problem, and figured it out via Chrome and with a free printer driver called PDF995. This is part of a suite of PDF utilities; the publisher's web site is http://www.pdf995.com/.

However, I think any web browser and any pdf converter will suffice. Anyway, here's what I did:

select all or highlight everything.
Right-click the highlighted selection or press Ctrl+P (both options give you slightly different results, but you end up with the same outcome after completion).
If you right-clicked in 2., the selection (the short-cut), click "print" and only all that you've selected will be on the print preview. Make sure you change your printer destination to whatever pdf converter you decide to use (PDF995 or other).
Click "print" and it saves as a pdf document.
If you pressed Ctrl+P in 2. (the slightly longer way) instead, click on "More settings" and scroll down to "Options".
Click the box that says "Selection only" and everything in the short-cut I described will follow.
Don't forget to change your printer destination to whatever pdf converter you choose (PDF995 or other).
Click "print".

Ezequiel Tolnay · Accepted Answer · 2016-04-13 04:42:05Z

2

If you're on Linux, try this small command line tool CutyCapt, which depends only on Qt and QtWebkit, and exports to PDF.

answered Apr 13, 2016 at 4:42

Ezequiel Tolnay

3231 silver badge8 bronze badges

Add a comment |

K J · Accepted Answer · 2024-03-14 00:07:16Z

Nobody seems (apart from one comment) to have pointed out that Opera does exactly what is asked for. It saves the page as one selectable PDF Page without cutting adverts! nor add page breaks! and exactly[*] the current view width narrow or wide.

Here we are viewing Narrow Page 1 at the bottom so the PDF in the center is one long page and zoomed in on the right we can see width has been ALMOST[*] exactly used.

[*] The difference is the scroll bars are removed on saving so the width is slightly wider and collateral damage, in that case, there is a slight shift in contents.

[*] NOTE There are hide scrollbar extensions for chromium's like Opera but results can be variable. However, checking the "nominated site" and Hide Scrollbars it shows it can be activated.

Pyheme · Accepted Answer · 2016-04-12 15:31:23Z

0

Although not exactly your request as not in PDF, if the objective is purely to keep an offline copy of webpages for later review, saving it as webpage would do just that.

The big caveat is that it will create a .html file and a folder with all the media content on the page rather than a single document.

In Chrome and Firefox, you can save a page doing a right click on it and choosing Save as... In Internet Explorer, you can save it under File -> Save as (pressing the Alt key for the menus to appear).

answered Apr 12, 2016 at 15:31

Pyheme

615 bronze badges

Saving the webpage in .html format would make it not-annotateable. So, I need it in PDF format.
– Omar
Commented Apr 12, 2016 at 15:34
That's a good point! Just remembered of an extension that allows you to easily disable print-related stylesheets. A quick google search led me to the discussion when I had first heard of it, on Superuser: How to get WYSIWYP (print what you see) in a web browser?
– Pyheme
Commented Apr 12, 2016 at 15:42
I tried doing "Save As" using Chrome. It creates a .HTML file and a folder. The .HTLM file was missing a whole lot of stuff from the page.
– SherlockSpreadsheets
Commented Dec 10, 2018 at 22:33

Add a comment |

David Herse · Accepted Answer · 2016-10-16 05:34:28Z

0

Try this service. Creates a PDF from a website as you see it in the browser. https://lomotoh.com/ (I am affiliated with this site)

edited Oct 16, 2016 at 5:34

answered Oct 9, 2016 at 11:59

David Herse

1011 bronze badge

This preserves links, but not selectable text, which is a requirement in the question.
– fixer1234
Commented Oct 15, 2016 at 23:07
Seems to be selectable for some sites. I think it depends what sort of custom font the site uses.
– David Herse
Commented Oct 16, 2016 at 3:18
4

The link does not work. You should remove this answer.
– PS Nayak
Commented Jun 6, 2020 at 15:23

Add a comment |

Gordon Couger · Accepted Answer · 2017-03-03 23:41:07Z

At least all of the text on some pages is searchable, selectable, cut and pastable. I tried on a page pasted up up robotically by a computer out of text and pix and it it tuned it all into an image.

I have used these things for years. I get the best results in Linux by rebuilding the page in a XX word of your choice and exporting the result as a PDF. I can get what I want at considerable cost. From the my limited use arch ivin The site David Herse put up https://lomotoh.com/ (I am NOT affiliated with this site) works as well as any I have ever used. I will be my go to resource to cover webpages to PDFs until I find better or it cost too much for me to pay out of my own thin purse.

Stephen Rauch · Accepted Answer · 2020-12-21 01:29:02Z

0

I would suggest trying wkhtmltopdf again as suggested by @sebisnow in their answer, with some pre-processing.

Prior do running the program, open the developer tools (Ctrl+Shift+I), and adjust the elements that aren't sitting correctly. Likely they are responsive for phone/desktop/tablet which means that the positions are relative to other HTML objects. Make them absolute positions instead.

Edit the source of the page, focusing on the margins and padding of the objects in question. Sometimes, simply making the canvas 10-15% larger will give even relative elements enough virtual room so they do not move.

I often do use the developer tools to adjust page elements when I'm printing to PDF so I have a reference file for later. Coupled with wkhtmltopdf, you should be able to have the site appear as it does in browser with the feature's that you're looking for like image and link preservation as well as text.

edited Dec 21, 2020 at 1:29

Stephen Rauch

3,18610 gold badges24 silver badges26 bronze badges

answered Dec 21, 2020 at 0:40

jon.bray.eth

1866 bronze badges

1

wkhtmltopdf does NOT work properly. I am a Browser War Veteran (remember 2006-09?) and that little piece of... tool gives me flashbacks. It will NOT understand page breaks, it will NOT print table gridlines thinner than 1mm (🤮) and it will NOT balance table line heights, keeping a fixed height and then dumping the remaining height on the last line. It is only useful if you go back to 1994 and print from NCSA Mosaic. I'm trying to use Selenium, headless browser and print to PDF. My solution will appear here if I ever make it run. I'm throwing in the kitchen sink and even pandoc.
– Ricardo
Commented Apr 11, 2022 at 12:45
@Ricardo I remember those days well. The last release does much better with escape characters, but as I said in my response you do need some pre-processing as wkhtmltopdf doesn't recognize newer/custom DOM elements. What I've had to do in the headless situation is have a script that modifies the HTML file (removes header and all non-necessary elements, replacing with common ones.) It is a very similar script to what browsers use for 'reading mode', which in my experience prints absolutely fine with wkhtmltopdf. Let me see if I can find the script that's been working for me to add.
– jon.bray.eth
Commented Apr 30, 2022 at 10:56

Add a comment |

Stack Exchange Network

How to convert a webpage to PDF with preserving its look (exactly as on web browser) and text/links?

Research Outcomes (Suggestions that didn't solve my problem)

Notes

10 Answers 10

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
google-chrome
firefox
browser
pdf
printing
.

Linked

Hot Network Questions

How to convert a webpage to PDF with preserving its look (exactly as on web browser) and text/links?

Research Outcomes (Suggestions that didn't solve my problem)

Notes

10 Answers 10

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged google-chromefirefoxbrowserpdfprinting.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
google-chrome
firefox
browser
pdf
printing
.