9

I'm trying to understand Markdown's relationship to HTML. If I understand correctly both are markup languages (an umbrella term describing languages that add formatting elements to plain-text documents). Markdown converts plain text to HTML.

My understanding is that Markdown is a superset of HTML:

Markdown is a popular markup language that is a superset of HTML.

I'm assuming that it's a strict or proper superset. Drawing a parallel from What does it mean when one language is a parallel superset of another?, I interpret that to mean that every valid HTML program is also a valid Markdown program (e.g. HTML is understood in a Jupyter Notebook Markdown cell), but that the converse is not true.

What seems conflicting to me is that if Markdown is a superset of HTML, then why is it that Markdown can't do everything HTML can (I would think the opposite to be true since a superset extends the language without removing or changing any of the existing features. Also, I would expect HTML to be a superset of Markdown since HTML is more expressive and more difficult to read by most humans.

Below is a diagram trying to mimic that in What does “Objective-C is a superset of C more strictly than C++” mean exactly? enter image description here

1 Answer 1

15

That documentation is misleading. Markdown itself is not a superset of HTML. The documentation for the original Markdown project is pretty clear:

Markdown is not a replacement for HTML, or even close to it. Its syntax is very small, corresponding only to a very small subset of HTML tags. The idea is not to create a syntax that makes it easier to insert HTML tags. In my opinion, HTML tags are already easy to insert. The idea for Markdown is to make it easy to read, write, and edit prose. HTML is a publishing format; Markdown is a writing format. Thus, Markdown’s formatting syntax only addresses issues that can be conveyed in plain text.

Today there are several flavours of Markdown, many of which add features that were not present in the original version like tables and syntax-highlighted code blocks. This doesn't change the fundamental fact that Markdown covers a subset of HTML.

(Technically speaking, Markdown isn't a subset of HTML either. *, for example, has no special meaning in HTML. Unconverted Markdown documents might be well-formed HTML but the semantics are very different. But Markdown syntax maps to a subset of HTML tags.)

However, the very next paragraph in the original documentation says:

For any markup that is not covered by Markdown’s syntax, you simply use HTML itself. There’s no need to preface it or delimit it to indicate that you’re switching from Markdown to HTML; you just use the tags.

Since you can directly use HTML in Markdown it could be considered a superset of HTML. For example, this is valid Markdown:

# My awesome title

I <em>really</em> like coffee

If you pass an HTML document through a conforming Markdown processor it should come out the other side untouched. Being able to directly use HTML in Markdown is very similar to how one can directly use C in C++. This may be what the Jupyter documentation means.

2
  • 1
    Thanks Chris. So, if I understand correctly, HTML is not a strict subset of Markdown, and I can think of Markdown is a superset of a very small subset of HTML tags.
    – ENIAC-6
    Commented Apr 8, 2019 at 1:33
  • 2
    @Iracambi, it really depends whether by "Markdown" you mean its own syntax (# for headings, * and - for unordered lists, 1. for ordered lists, * and _ for emphasis, ** and __ for strong, etc.) or if you include the "simply use HTML itself" part. In the first case, Markdown maps to a subset of HTML. In the second case, Markdown could be syntactically considered a superset of HTML, but since it doesn't add any additional semantics it could be considered an equal set. Personally, I'd say "Markdown maps to a subset of HTML". It adds no semantics and can't to everything HTML can.
    – Chris
    Commented Apr 8, 2019 at 1:52

Not the answer you're looking for? Browse other questions tagged or ask your own question.