0

I'm building some documents which need to live online on google docs. I start off by building static html files using Ruby on Rails. These look fine.

Then I convert them to a pdf using wkhtmltopdf. These also look fine.

Then, i convert them to .doc files using https://pdf2doc.com/. These look mostly fine.

The problem comes when I want to get them on google docs in an editable format. If I import either the .pdf or the .doc. and choose 'convert into google doc', the formatting is completely broken. Like totally all over the place: images jumbled up next to each other etc.

I've tried a chrome extension called "Save to Google Drive", which can be called on the html version, but that looks awful as well.

One approach I haven't tried is to output all of the data used to build the docs (there are about 10 docs to build, all with an identical format but different data) into a google spreadsheet and do some kind of mail merge thing to generate the docs. But it feels like there must be a way to make either a google doc OR a google slideshow out of either html, pdf or .doc format without the format getting messed up.

Another thing i thought of was to try and build the original html files just using tables, to see if they make it through the translation process better. But again, lots of hassle.

Any advice, anyone?

EDIT - it occurred to me that maybe I need to set a particular doctype, or some meta tags, in my html? Is that likely?

8
  • 2
    There exist HTML to DOCX converters, have you tried some?
    – harrymc
    Commented Apr 8, 2023 at 11:31
  • Did you try pandoc for conversion?
    – Destroy666
    Commented Apr 8, 2023 at 14:53
  • @harrymc i googled for this but either didn't find any or couldn't find one that did a good job (I can't remember now, sorry) - if you're able to recommend one that would be very helpful. Commented Apr 8, 2023 at 15:59
  • 1
    There are several found in google html to docx.
    – harrymc
    Commented Apr 8, 2023 at 16:52
  • 1
    What in the formatting breaks? Why are you going that convoluted route, through all those conversions? Every step of this process seems to beg "why?" I think it may be a good idea to step back and reconsider what each step is accomplishing, and whether there's a better way to do that. If you can step back in your question and explain these Why's, that may also help us help you to a better overall solution. Commented Apr 9, 2023 at 21:33

1 Answer 1

0

It's not clear what might be happening. Please bear in mind that Google Docs doesn't support all the stuff that can be done with "HTML" and that Google Drive has limits. From Files you can store in Google Drive

File sizes

The following are the maximum file sizes you can store in Google Drive:

Documents

Up to 1.02 million characters.
If you convert a text document to Google Docs format, it can be up to 50 MB.

Try creating a document using only simple HTML tags with no attributes. Omit the use of div, iframe, video, audio, form, input, select, textarea, img, and span tags. There might be others to omit. Review that all the tags that properly closed.

You might tables (table, thead, tbody, tr, th, and td) to handle the layout but assure that there is no table that exceeds the page size that you will use in Google Docs. Don't forget that Google Docs has default page settings.

Once you confirm that the basic document structure is being converted correctly add styling using inline CSS (style attribute), you might use span tags and add img tags.

Be sure that the images are hosted on the web, are publicly available, and each one is relatively small (less than 1 MB), and that the HTML and images size are less than 50 MB. If you need to use larger images, you could try to add them later directly into the corresponding document by using the Google Documents editor. It might be also possible to do this by using Google Apps Script and/or the Google Docs API.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .