27

I was applying markdown comments in the xml comments of a config file when the XmlParser reported that two hyphens (--) are not allowed in xml comments.

Checking the XML Specification, it appears that xml comment isn't designed to contain two hyphens for compatibility reasons with SGML parsers.

Why do SGML parsers disallow double hyphens in comments?

3
  • 2
    I don't think you will get a better answer than "because that's what the standard says"
    – jk.
    Commented May 17, 2013 at 12:11
  • Well, it's that's the only answer, then I have no choice :-(, though there might be better answer than that. Commented May 17, 2013 at 13:12
  • check out my answer here stackoverflow.com/a/20885152/582727
    – Daniel
    Commented Feb 1, 2021 at 17:18

2 Answers 2

42

This page outlines quite a bit of the HTML/SGML history, and the rather convoluted rules of those two consecutive hyphens (double dash).

The relevant part about SGML:

To put it simply, the double dash at the start and end of the comment do not start and end the comment. Double dash indicates a change in what the comment is allowed to contain. The first -- starts the comment, and tells the browser that the comment is allowed to contain > characters without ending the comment. The second -- does not end the comment. It tells the browser that if it encounters a > character, it must then end the comment. If another -- is added, then it goes back to allowing the > characters.

3
  • 7
    The section you're referring to. When I read what the SGML specs intended for -- within the comment, my head spins around on the complexity it will introduce later on. Commented May 17, 2013 at 14:22
  • 2
    The advice to never use -- inside a comment seems good to me. But, is there a standard way of escaping it? Suppose I want to create (and share) an output filter to ensure foo -- bar never causes a problem. Is there an SGML equivalent of foo -\- bar? (I'm sure it's not backslash though!) Or - (see this answer), or something else? If we just replace -- with - or - -, the escaping is not reversible.
    – fazy
    Commented Oct 9, 2014 at 16:59
  • @fazy Sorry, but it is a comment, something that is there just so explain what follows to a human reader with a high potential that no user of that document ever reads it. I really do not understand the point in insisting on double dashes as part of a comment. Commented Jul 1, 2022 at 15:13
17

Because a double hyphen is the comment delimiter in SGML. The <! starts an SGML instruction, the -- indicates the start or end of a comment. So basically it is for the same reason that a C++ comment cannot contain */.

6
  • 2
    I think --> is the comment delimiter. Commented May 17, 2013 at 13:54
  • 13
    No, it is not. <! starts an SGML instruction, > ends it. Within an SGML instruction -- both starts and ends a comment. Commented May 17, 2013 at 14:44
  • 8
    Ahh add your comment to the answer, this is illuminating because it means you could write <!someRelevantSgmlTag -- a comment -- someAttribute="blabla" -- another comment --> and the semantic meaning would be <!someRelevantSgmlTag someAttribute="blabla"> Commented May 17, 2013 at 15:05
  • 3
    Ah, makes sense. --> is actually two tokens, the -- to delimit comment, and > to end SGML instruction. Now, I have an idea on where <![CDATA[ ... ]]> originated. Commented May 17, 2013 at 15:09
  • I've merged the comment. Commented May 17, 2013 at 15:22

Not the answer you're looking for? Browse other questions tagged or ask your own question.