When I export a file from Word or TextEdit, I get very bloated HTML, full of crazy style
tags on every paragraph, so I can't even clean it by hand.
The only information I want preserved is:
<h1>, <h2>, <h3>, <p>
tags.Alignment (center, left, right)
links, external and internal (for the table of contents)
<img>
tags