From Word document to clean html

Recently I tweeted that was the thing that had been missing from my life. I am not going to go into the reasons why it would be just that important to Me Personally, but I am excited enough to write a blogpost about it.

▲ Aside: Don’t forget to follow @NatureProtocols on!!

♫ Here we go! Are you paying attention!

This Blogpost has been written in Word. The formatting has not been added using any of the weird stars and underscores that you normally find yourself using, and was achieved in six easy steps:

  1. Write the post with the desired formatting.
  2. Go to and paste in the text (Check the boxes “Remove empty paragraphs”, “Replace smart quotes with ascii equivalents”, “Indent with tabs, not spaces” and “Replace non-breaking spaces with ordinary spaces”).
  3. Copy the resulting clean html into a new Word document.
  4. Find and replace all paragraph marks with “nothing”.
  5. Find and replace all tabs with “nothing”.
  6. Paste the resulting text into the text field in movable type.

Can we even do a table??

Step Number Problem Solution
4 You can’t quite work out what to put in the “Find what” field. Go to “More”, “Special” – the Paragraph Mark should be the top entry, and should be ^p.
6. When you preview the document, the page layout is completely screwed up. Every

needs to have a corresponding

. Similarly with
. Check that this is indeed thecase. A common “thing” is that you will not have copied and pasted the first “<” or the final “>”. If it just looks too complicated, it is worth either repeating the copy-and-paste steps OR just adding another at the end of the text andhoping for the best.

And lastly: You could use this tool when uploading your protocols on the Protocol Exchange and (fingers crossed) your beautiful formatting will be preserved without anyone having to break a sweat.


Word2Cleanhtml was written by Olly Cope.


This post has been edited since it was originally published; the whole things works even better if you tick the “Indent with tabs, not spaces” box in