I'm look for either a free or paid-for (about 50$/40pounds) BATCH PDF to HTML converter to convert several PDF files at once.

Needs to be able to handle vectored and bitmap images within the file, outputting both as jpegs referenced by the html pages.

I've tried iorigsoft paid-for PDF to HTML - problems it seems to hang or just go idle, and the stuff it actually converts have broken links - the wrong name is used for constituent chapters as html.

Also tried application from intrapdf.com but this crashes near the beginning of the conversion, consitently.


intrapdf works on my Windows XP machine but not on my Windows 7 machine. The only glitch is with the framed index contents html - the graphics in the page do not display in the page in the frame - but if you open the frame only in a new tab then you can see them. That might be a browser glitch in chrome only.

This solution is good enough for me - given that I've already spent the money (I had spent before I asked) but I can't accept my own answer as this does not work on Windows 7.

Looked at opensource tools but they look equally flakey or use old PDF versions.

Need it on Windows 7 32bit home.


  • 2
    Just to warn you: "HTML" and "accurate" don't often belong in the same sentence. Commented Mar 8, 2011 at 21:59
  • if none of our solutions worked, you could post the one you used and marked it as an answer :)
    – Sathyajith Bhat
    Commented Mar 9, 2011 at 3:54

8 Answers 8


PDF is a lousy input format for conversion, so "flakey" is petty much the rule. Some files can be converted relatively easily but most will have problems. (Very briefly: a PDF file is a compressed list of "move here, output this, move there, ...". If the document contains anything other than simple L-to-R text — tables, images, RTL text, footnotes, etc. — the conversion will probably produce some amount of garbage.)

  • 2
    Additionally, PDFs may also contain portions of fonts, and replicating the font size on a web browser that's running on a computer that may not have these fonts installed is not going to result in the same appearance unless they are rendered to graphic images ahead-of-time. Commented Mar 8, 2011 at 2:57
  • +1 @Randolf Richardson for that point. +1 @geekosaur fair enough to know the limitations potentially. Commented Mar 8, 2011 at 20:48

There is a HTML Javascript based PDF renderer called PDF.js that uses the Canvas element. http://mozilla.github.com/pdf.js/web/viewer.html

It's under development but it might do the job for some.


I'd check if openoffice/libreoffice have command line flags for conversion.

PDFs suck for what you're trying to do. There is a huge Document model mismatch between how PDF sees a page and how HTML sees a page. There will be PDF files that just can't be converted easily to HTML by anything.

  • +1 worth checking, there is a python based command line suite for using Open Office to convert between formats. Commented Mar 8, 2011 at 20:49

'Gemini' from Iceni batch converts PDF documents to HTML...


The output isn't 100% perfect but you might find it acceptable. And it's a good base to work from. If you're a perfectionist then some post-production 'search & replace' can usually iron out most issues.

  • +1 looks good though they don't specify which Windows platforms prominently and the screenshots are XP but one can assume it works also for Windows 7. Also, they don't say if they deal with encrypted password protected documents, though I confess I didn't originally ask for this. Commented Mar 8, 2011 at 20:53

You can try Okdo PDF to HTML converter

  • +1 Looks good, it has got good ratings, deals with passwords and runs on Windows 7. Commented Mar 8, 2011 at 20:54

My solution would be to 2 parts 1) to continue to use IntraPDF PDF to JPG program (I paid for it) (http://www.intrapdf.com/convert_pdf_to_html.htm) on my XP Platform (doesn't seem to work on Windows 7 Home 32bit, hangs).

But I agree with you, @geekosaur, about the PDF and HTML having different goals therefore the translation/conversion won't be exact (even with CSS applied to the HTML, perhaps) and actually the resultant HTML I've seen on some pages has formatting that is not the same but that will do.

So the 2nd part of the solution would be to use free application program tool IrfanView to convert from PDF to JPG, the PDF document being a series of JPG images, one for each document page. This is easy to setup, IrfanView view packages PDF conversion as part of its plug-in suite, and the pre-requisite for PDF is downloading GhostView, which IrfanView provides a link to. This works very well, except that during the process, the UI sometimes hangs but the conversion still proceeds.


To clarify on my goal, I wanted the pdf documents in a non-proprietary format which would afford me more possibilities for viewing the docs in the future. PDF is fairly ubiquitous though but I like my data to be free as in not tied to a format.

Thanks to other contributors:


There is free open source command-line tool http://sourceforge.net/projects/pdftohtml/.

After a short evaluation, it appears to be currently suitable mostly for simple documents. Complex formatting may vary.

Behaves badly with non-latin encodings.


There is also FREE PDF to HTML tool by http://www.freepdfsolutions.com.

Features simple gui with batch support. No ads.

It struggles to preserve equal formatting with dead-simple bullet-proof trick. All graphics is rendered to single large background jpg image to every page. Also all text divs in html use absolute alignment. This gives precise-looking result but big size and ugly html.

  • Prefer something standalone on my machine that I have complete control of. With an online service there's a chance they retain the data. Commented Oct 11, 2012 at 16:01
  • It's downloadable, not online.
    – Vadzim
    Commented Oct 11, 2012 at 16:04
  • Why is it free, @Vadzim, do you know? The website has a sterile, corporate identikit feel to it, no names, no addresses, no faces. No reason is given for offering the software for free. The absence of a motive makes me suspicious I'm afraid. Many companies do offer free fully functional versions of their software without a trial period to attract users to upgrading to professional, paid, version that offers advanced features or enhanced convenience. Some of the pages have "lorem ipsum" place holder text. There are no endorsements, e.g. from major trusted download sites such as CNET. Commented Oct 11, 2012 at 22:33
  • I'm not affiliated and don't know their business model. But it seems they aim at paid custom development and use free tools for promotion. BTW, I've first found it on CNET: download.cnet.com/Free-PDF-to-HTML/3000-10743_4-75732610.html
    – Vadzim
    Commented Oct 12, 2012 at 5:56

Not the answer you're looking for? Browse other questions tagged .