124

Can someone explain to me what a context free grammar is? After looking at the Wikipedia entry and then the Wikipedia entry on formal grammar, I am left utterly and totally befuddled. Would someone be so kind as to explain what these things are?

I am wondering this because I wish to investigate parsing, and also on the side, the limitation of a regex engine.

I'm not sure if these terms are directly programming related, or if they are related more to linguistics in general. If that is the case, I apologize, perhaps this could be moved if so?

3
  • 3
    It's more related to Automata Theorem
    – Rahul
    Commented Jul 15, 2011 at 21:24
  • 4
    If you are interested in formal languages and automata theory for parsing, I suggest a book like Sudkamp's Languages and Machines or Aho, Sethi & Ullman's Compilers. Each book provides a formal description of a context-free grammar, which is a type of formal grammar, then states and proves basic theorems about context-free grammars required to understand them (such as the pumping lemma for context-free languages and conversion and normal form theorems). There is no mathematical prerequisite for learning formal language theory beyond a cursory understanding of set theory.
    – emi
    Commented Jul 16, 2011 at 6:24
  • 1
    Shouldn't such questions be migrated to Theoretical Computer Science?
    – Ravindra S
    Commented Jul 16, 2012 at 5:48

2 Answers 2

139

A context free grammar is a grammar which satisfies certain properties. In computer science, grammars describe languages; specifically, they describe formal languages.

A formal language is just a set (mathematical term for a collection of objects) of strings (sequences of symbols... very similar to the programming usage of the word "string"). A simple example of a formal language is the set of all binary strings of length three, {000, 001, 010, 011, 100, 101, 110, 111}.

Grammars work by defining transformations you can make to construct a string in the language described by a grammar. Grammars will say how to transform a start symbol (usually S) into some string of symbols. A grammar for the language given before is:

S -> BBB
B -> 0
B -> 1

The way to interpret this is to say that S can be replaced by BBB, and B can be replaced by 0, and B can be replaced by 1. So to construct the string 010 we can do S -> BBB -> 0BB -> 01B -> 010.

A context-free grammar is simply a grammar where the thing that you're replacing (left of the arrow) is a single "non-terminal" symbol. A non-terminal symbol is any symbol you use in the grammar that can't appear in your final strings. In the grammar above, "S" and "B" are non-terminal symbols, and "0" and "1" are "terminal" symbols. Grammars like

S -> AB
AB -> 1
A -> AA
B -> 0

Are not context free since they contain rules like AB -> 1 that have more than one non-terminal symbol on the left.

4
  • 15
    By 'not regular', do you mean 'not context-free'? (because the language representable by CFGs is a super-set of those representable by regular expressions)
    – Anti Earth
    Commented Nov 3, 2013 at 1:33
  • 3
    Should "S can be replaced by B" read "S can be replaced by BBB"? Commented Mar 28, 2014 at 4:41
  • 7
    Good lord, this is one of the best explained answers I've seen on SO. Commented Oct 9, 2017 at 17:33
  • 2
    @AntiEarth the second example is not a regular grammar because it has rules which generate two non-terminal symbols from a single nonterminal symbol, which is not allowed in regular grammars (also, as OP pointed out, it has rules with multiple nonterminal symbols on the left). en.wikipedia.org/wiki/Regular_grammar
    – awwsmm
    Commented Mar 26, 2018 at 10:53
26

Language Theory is related to Theory of Computation. Which is the more philosophical side of Computer Science, about deciding which programs are possible, or which will ever be possible to write, and what type of problems is it impossible to write an algorithm to solve.

A regular expression is a way of describing a regular language. A regular language is a language which can be decided by a deterministic finite automaton.

You should read the article on Finite State Machines: http://en.wikipedia.org/wiki/Finite_state_machine

And Regular languages: http://en.wikipedia.org/wiki/Regular_language

All Regular Languages are Context Free Languages, but there are Context Free Languages that are not regular. A Context Free Language is the set of all strings accept by a Context Free Grammer or a Pushdown Automata which is a Finite State Machine with a single stack: http://en.wikipedia.org/wiki/Pushdown_automaton#PDA_and_Context-free_Languages

There are more complicated languages that require a Turing Machine (Any possible program you can write on your computer) to decide if a string is in the language or not.

Language theory is also very related to the P vs. NP problem, and some other interesting stuff.

My Introduction to Computer Science third year text book was pretty good at explaining this stuff: Introduction to the Theory of Computation. By Michael Sipser. But, it cost me like $160 to buy it new and it's not very big. Maybe you can find a used copy or find a copy at a library or something it might help you.

EDIT:

The limitations of Regular Expressions and higher language classes have been researched a ton over the past 50 years or so. You might be interested in the pumping lemma for regular languages. It is a means of proving that a certain language is not regular:

http://en.wikipedia.org/wiki/Pumping_lemma_for_regular_languages

If a language isn't regular it may be Context Free, which means it could be described by a Context Free Grammer, or it may be even in a higher language class, you could prove it's not Context Free by the pumping lemma for Context Free languages which is similar to the one for regular expressions.

A language can even be undecidable, which means even a Turing machine (may program your computer can run) can't be programmed to decide if a string should be accepted as in the language or rejected.

I think the part you're most interested in is the Finite State Machines (Both Deterministic and Deterministic) to see what languages a Regular Expression can decide, and the pumping lemma to prove which languages are not regular.

Basically a language isn't regular if it needs some sort of memory or ability to count. The language of matching parenthesis is not regular for example because the machine needs to remember if it has opened a parenthesis to know if it has to close one.

The language of all strings using the letters a and b that contain at least three b's is a regular language: abababa

The language of all strings using the letters a and b that contain more b's than a's is not regular.

Also you should not that all finite language are regular, for example:

The language of all strings less than 50 characters long using the letters a and b that contain more b's than a's is regular, since it is finite we know it could be described as (b|abb|bab|bba|aabbb|ababb|...) ect until all the possible combinations are listed.

4
  • 1
    Regular expressions aren't decision programs that match strings against patterns. They are expressions that denote regular sets, for which the membership problem is decidable.
    – emi
    Commented Jul 16, 2011 at 6:33
  • 1
    If a set is regular it's obviously decidable. I'm not sure how else to word it. They are effectively decision programs which do not have memory.
    – Paul
    Commented Jul 16, 2011 at 6:40
  • You are describing deterministic finite automata, which provide a decision procedure for regular languages ("decision programs which do not have memory"). Regular expressions are terms that denote regular languages, not programs are procedures. This was my sole complaint.
    – emi
    Commented Jul 16, 2011 at 6:56
  • 1
    I changed it to "A regular expression is a way of describing a regular language. A regular language is a language which can be decided by a deterministic finite automaton." Does that sound better?
    – Paul
    Commented Jul 16, 2011 at 7:05

Not the answer you're looking for? Browse other questions tagged or ask your own question.