10

I was entirely amazed by how Coq's parser is implemented. e.g.

https://softwarefoundations.cis.upenn.edu/lf-current/Imp.html#lab347

It's so crazy that the parser seems ok to take any lexeme by giving notation command and subsequent parser is able to parse any expression as it is. So what it means is the grammar must be context sensitive. But this is so flexible that it absolutely goes beyond my comprehension.

Any pointers on how this kind of parser is theoretically feasible? How should it work? Any materials or knowledge would work. I just try to learn about this type of parser in general. Thanks.

Please do not ask me to read Coq's source myself. I want to check the idea in general but not a specific implementation.

0

2 Answers 2

13

Indeed, this notation system is very powerful and it was probably one of the reasons of Coq's success. In practice, this is a source of much complication in the source code. I think that @ejgallego should be able to tell you more about it but here is a quick explanation:

  • At the beginning, Coq's documents were evaluated sentence by sentence (sentences are separated by dots) by coqtop. Some commands can define notations and these modify the parsing rules when they are evaluated. Thus, later sentences are evaluated with a slightly different parser.

  • Since version 8.5, there is also a mechanism (the STM) to evaluate a document fully (many sentences in parallel) but there is some special mechanism for handling these notation commands (basically you have to wait for these to be evaluated before you can continue parsing and evaluating the rest of the document).

Thus, contrary to a normal programming language, where the compiler will take a document, pass it through the lexer, then the parser (parse the full document in one go), and then have an AST to give to the typer or other later stages, in Coq each command is parsed and evaluated separately. Thus, there is no need to resort to complex contextual grammars...

3
  • so basically coq makes each sentence a separate compilation unit, such that it can have a chance to readjust parser between each unit, is that it? can i conclude that if a whatever error is made in the very last line of the code, the compiler will be forced to compile to the last line before panicking?
    – Jason Hu
    Commented Apr 4, 2017 at 2:32
  • Absolutely. And whereas other compilers would e.g. first signal parsing errors, then typing errors, Coq will signal all types of errors in the order of the document (and stop at the first one).
    – Zimm i48
    Commented Apr 4, 2017 at 11:15
  • 3
    Well actually after 8.5 parsing and execution errors are separated and the error recovery strategy in 8.6 is a bit more complex than that, but yeah, that is the basic idea.
    – ejgallego
    Commented Apr 4, 2017 at 13:00
10

I'll drop my two cents to complement @Zimmi48's excellent answer.

Coq indeed features an extensible parser, which TTBOMK is mainly the work of Hugo Herbelin, built on the CAMLP4/CAMLP5 extensible parsing system by Daniel de Rauglaudre. Both are the canonical sources for information about the parser, I'll try to summarize what I know but note indeed that my experience with the system is short.

The CAMLPX system basically supports any LL1 grammar. Coq exposes to the user the whole set of grammar rules, allowing the user to redefine them. This is the base mechanism on which extensible grammars are built. Notations are compiled into parsing rules in the Metasyntax module, and unfolded in a latter post-processing phase. And that really is AFAICT.

The system itself hasn't changed much in the whole 8.x series, @Zimmi48's comments are more related to the internal processing of commands after parsing. I recently learned that Coq v7 had an even more powerful system for modifying the parser.

In words of Hugo Herbelin "the art of extensible parsing is a delicate one" and indeed it is, but Coq's achieved a pretty great implementation of it.

Not the answer you're looking for? Browse other questions tagged or ask your own question.