30

Is there any module that can parse restructuredtext into a tree model?

Can docutils or sphinx do this?

2

4 Answers 4

42

I'd like to extend upon the answer from Gareth Latty. "What you probably want is the parser at docutils.parsers.rst" is a good starting point of the answer, but what's next? Namely:

How to parse restructuredtext in python?

Below is the exact answer for Python 3.6 and docutils 0.14:

import docutils.nodes
import docutils.parsers.rst
import docutils.utils
import docutils.frontend

def parse_rst(text: str) -> docutils.nodes.document:
    parser = docutils.parsers.rst.Parser()
    components = (docutils.parsers.rst.Parser,)
    settings = docutils.frontend.OptionParser(components=components).get_default_values()
    document = docutils.utils.new_document('<rst-doc>', settings=settings)
    parser.parse(text, document)
    return document

And the resulting document can be processed using, for example, below, which will print all references in the document:

class MyVisitor(docutils.nodes.NodeVisitor):

    def visit_reference(self, node: docutils.nodes.reference) -> None:
        """Called for "reference" nodes."""
        print(node)

    def unknown_visit(self, node: docutils.nodes.Node) -> None:
        """Called for all other node types."""
        pass

Here's how to run it:

doc = parse_rst('spam spam lovely spam')
visitor = MyVisitor(doc)
doc.walk(visitor)
5
  • How do I unparse it? As in, go from a series of nodes back to an .rst file after I've made some modification to the nodes?
    – user3064538
    Commented Feb 23, 2021 at 22:19
  • 4
    Thank you for the code samples! It's ironic that docutils has the worst documentation of any library I've tried to use (at least it has documentation though).
    – user3064538
    Commented Feb 23, 2021 at 22:20
  • Unparsing - that's an entirely different question, I encourage you to ask separately to this one, unless it's already been asked of course! I'm not aware of any specific solution at the moment.
    – mbdevpl
    Commented Feb 24, 2021 at 10:50
  • done: How do I unparse restructured text back into an rst file?
    – user3064538
    Commented Feb 24, 2021 at 14:44
  • The recommended way is using the "Publisher Convenience Functions" (docutils.sourceforge.io/docs/api/…) provided by docutils.core.
    – G. Milde
    Commented Nov 26, 2023 at 16:44
18

Docutils does indeed contain the tools to do this.

What you probably want is the parser at docutils.parsers.rst

See this page for details on what is involved. There are also some examples at docutils/examples.py - particularly check out the internals() function, which is probably of interest.

1
  • 4
    Just to add that Docutils is the reference implementation of reStructuredText and that Sphinx is built on top of Docutils. So yes, Docutils is definitely the correct tool for this.
    – Chris
    Commented Oct 18, 2012 at 11:41
0

Based on Gareth Latty's and mbdevpl's answers here is an update for newer versions of docutils.

Starting with docutils 0.18 (2021-10-26), docutils.frontend.OptionParser has been deprecated (git mirror commit, upstream SVN HISTORY.txt), and the following warning will be printed (source):

DeprecationWarning: The frontend.OptionParser class will be replaced by a subclass of argparse.ArgumentParser in Docutils 0.21 or later.

The docutils.frontend.get_default_settings() function can be used instead, but it was only added in docutils 0.18, so to be compatible with all versions without getting warnings, you can use:

import docutils.parsers.rst
import docutils.utils
import docutils.frontend

def parse_rst(text: str) -> docutils.nodes.document:                                                                                                     
    parser = docutils.parsers.rst.Parser()                                                                               
    if hasattr(docutils.frontend, 'get_default_settings'):
        # docutils >= 0.18
        settings = docutils.frontend.get_default_settings(docutils.parsers.rst.Parser)                                                       
    else:
        # docutils < 0.18
        settings = docutils.frontend.OptionParser(components=(docutils.parsers.rst.Parser,)).get_default_values()
    document = docutils.utils.new_document('<rst-doc>', settings=settings)                                               
    parser.parse(text, document)                                                                                         
    return document                                                                                                      

The rest of the code stays the same and can be found in mbdevpl's answer.

0

There is a more high-level interface to Docutils in the docutils.core module. To parse a string of reStructuredText into a document tree, do, e.g.,

from docutils.core import publish_doctree
source = 'Hello *world*'
tree = publish_doctree(source)

For details, see https://docutils.sourceforge.io/docs/api/publisher.html

Not the answer you're looking for? Browse other questions tagged or ask your own question.