SlideShare a Scribd company logo
Living with spec
@sbelak
simon@goopti.com
Parsing with Derivatives
A Functional Pearl
Matthew Might David Darais
University of Utah
might@cs.utah.edu, david.darais@gmail.com
Daniel Spiewak
University of Wisconsin, Milwaukee
dspiewak@uwm.edu
Abstract
We present a functional approach to parsing unrestricted context-
free grammars based on Brzozowski’s derivative of regular expres-
sions. If we consider context-free grammars as recursive regular ex-
pressions, Brzozowski’s equational theory extends without modifi-
cation to context-free grammars (and it generalizes to parser combi-
nators). The supporting actors in this story are three concepts famil-
iar to functional programmers—laziness, memoization and fixed
points; these allow Brzozowski’s original equations to be translit-
erated into purely functional code in about 30 lines spread over
three functions.
Yet, this almost impossibly brief implementation has a draw-
back: its performance is sour—in both theory and practice. The
culprit? Each derivative can double the size of a grammar, and with
it, the cost of the next derivative.
Fortunately, much of the new structure inflicted by the derivative
is either dead on arrival, or it dies after the very next derivative.
To eliminate it, we once again exploit laziness and memoization
The derivative of regular expressions [1], if gently tempered
with laziness, memoization and fixed points, acts immediately
as a pure, functional technique for generating parse forests from
arbitrary context-free grammars. Despite—even because of—its
simplicity, the derivative transparently handles ambiguity, left-
recursion, right-recursion, ill-founded recursion or any combina-
tion thereof.
1.1 Outline
• After a review of formal languages, we introduce Brzozowski’s
derivative for regular languages. A brief implementation high-
lights its rugged elegance.
• As our implementation of the derivative engages context-free
languages, non-termination emerges as a problem.
• Three small, surgical modifications to the implementation (but
not the theory)—laziness, memoization and fixed points—
guarantee termination. Termination means the derivative can
blog.klipse.tech/clojure/2016/10/02/parsing-with-derivatives-regular.html
Writing a spec should enable automatic:
• Validation
• Error reporting
• Destructuring
• Instrumentation
• Test-data generation
• Generative test generation
*http://clojure.org/about/spec
“Composition is about
decomposing.”
— E. Normand
The code base
• ~15k loc
• ETL
• Risk-hedging
• Demand-responsive pricing
• Packing & routing
• Internal BI tools
• …
Validation
• API boundaries
• Structured errors
• Fail fast = more context
• nil punning
• Multiple airities
Friendlier error messages
• Pluggable explanations via s/*explain-out*
• Capture common mistakes
• Provide hints
• Limitations
• One explainer function for all
• Don’t know which spec (only what went wrong)
Destructuring
• Pull apart and name
• Patterns
• case (dispatch on tag)
• core.match
• Separate data description and transformation
Two schools of
thinking
System paradigmLanguage paradigm
infoq.com/presentations/Mixin-based-Inheritance
realworldclojure.com/the-system-paradigm
The system paradigm
1. Nibble at the problem from different directions
2. Compose partial solutions into the final solution
Data macros
• Recursive transformations into canonical form
• s/conformer
• Do more without code macros
* juxt.pro/blog/posts/data-macros.html
Living with-spec
Living with-spec
Living with-spec
Generative testing
• Limitations
• sequences with internal structure (time series
etc.)
• generic higher-order functions (e.g. map)
• Uncovers numerical instabilities
• s/exercise for mocking
Queryable data descriptions
• s/registry, s/form
• Build a graph
• No inline docs :(
Case study: autogenerating materialised views
Kafka
Materialised
views
Events
External data
Automatic view generation
• Event & attribute ontology
• Manual (via spec)
• Inferred
• Statistical analysis (seasonality
detection, outlier removal, …)
Onyx Onyx
Onyx
Simple first,
then Easy
Living with-spec
Going forward
Questions
@sbelak
simon@goopti.com
My favourites
• Data macros
• Destructuring
• Queryable data descriptions
• Structured errors
• clojure.org/about/spec
• matt.might.net/papers/might2011derivatives.pdf
• infoq.com/presentations/Mixin-based-Inheritance
• realworldclojure.com/the-system-paradigm
• juxt.pro/blog/posts/data-macros.html
• blog.klipse.tech/clojure/2016/10/02/parsing-with-derivatives-
regular.html
• indiegogo.com/projects/typed-clojure-clojure-spec-auto-
annotations#/
• github.com/arohner/spectrum

More Related Content

Living with-spec

  • 2. Parsing with Derivatives A Functional Pearl Matthew Might David Darais University of Utah might@cs.utah.edu, david.darais@gmail.com Daniel Spiewak University of Wisconsin, Milwaukee dspiewak@uwm.edu Abstract We present a functional approach to parsing unrestricted context- free grammars based on Brzozowski’s derivative of regular expres- sions. If we consider context-free grammars as recursive regular ex- pressions, Brzozowski’s equational theory extends without modifi- cation to context-free grammars (and it generalizes to parser combi- nators). The supporting actors in this story are three concepts famil- iar to functional programmers—laziness, memoization and fixed points; these allow Brzozowski’s original equations to be translit- erated into purely functional code in about 30 lines spread over three functions. Yet, this almost impossibly brief implementation has a draw- back: its performance is sour—in both theory and practice. The culprit? Each derivative can double the size of a grammar, and with it, the cost of the next derivative. Fortunately, much of the new structure inflicted by the derivative is either dead on arrival, or it dies after the very next derivative. To eliminate it, we once again exploit laziness and memoization The derivative of regular expressions [1], if gently tempered with laziness, memoization and fixed points, acts immediately as a pure, functional technique for generating parse forests from arbitrary context-free grammars. Despite—even because of—its simplicity, the derivative transparently handles ambiguity, left- recursion, right-recursion, ill-founded recursion or any combina- tion thereof. 1.1 Outline • After a review of formal languages, we introduce Brzozowski’s derivative for regular languages. A brief implementation high- lights its rugged elegance. • As our implementation of the derivative engages context-free languages, non-termination emerges as a problem. • Three small, surgical modifications to the implementation (but not the theory)—laziness, memoization and fixed points— guarantee termination. Termination means the derivative can
  • 4. Writing a spec should enable automatic: • Validation • Error reporting • Destructuring • Instrumentation • Test-data generation • Generative test generation *http://clojure.org/about/spec
  • 6. The code base • ~15k loc • ETL • Risk-hedging • Demand-responsive pricing • Packing & routing • Internal BI tools • …
  • 7. Validation • API boundaries • Structured errors • Fail fast = more context • nil punning • Multiple airities
  • 8. Friendlier error messages • Pluggable explanations via s/*explain-out* • Capture common mistakes • Provide hints • Limitations • One explainer function for all • Don’t know which spec (only what went wrong)
  • 9. Destructuring • Pull apart and name • Patterns • case (dispatch on tag) • core.match • Separate data description and transformation
  • 12. The system paradigm 1. Nibble at the problem from different directions 2. Compose partial solutions into the final solution
  • 13. Data macros • Recursive transformations into canonical form • s/conformer • Do more without code macros * juxt.pro/blog/posts/data-macros.html
  • 17. Generative testing • Limitations • sequences with internal structure (time series etc.) • generic higher-order functions (e.g. map) • Uncovers numerical instabilities • s/exercise for mocking
  • 18. Queryable data descriptions • s/registry, s/form • Build a graph • No inline docs :(
  • 19. Case study: autogenerating materialised views Kafka Materialised views Events External data Automatic view generation • Event & attribute ontology • Manual (via spec) • Inferred • Statistical analysis (seasonality detection, outlier removal, …) Onyx Onyx Onyx
  • 24. My favourites • Data macros • Destructuring • Queryable data descriptions • Structured errors
  • 25. • clojure.org/about/spec • matt.might.net/papers/might2011derivatives.pdf • infoq.com/presentations/Mixin-based-Inheritance • realworldclojure.com/the-system-paradigm • juxt.pro/blog/posts/data-macros.html • blog.klipse.tech/clojure/2016/10/02/parsing-with-derivatives- regular.html • indiegogo.com/projects/typed-clojure-clojure-spec-auto- annotations#/ • github.com/arohner/spectrum