15
$\begingroup$

The Myhill-Nerode Theorem gives an exact characterization of the regular languages. Given any language, one can check whether it meets the criteria of the Myhill-Nerode theorem to decide whether or not it is regular. Note that this is stronger than the pumping lemma for regular languages, which gives a necessary (but not sufficient) condition for a language to be regular.

Is there an analogous result for context-free languages? That is, is there a theorem that gives a necessary and sufficient condition for a language to be context-free?

Thanks!

$\endgroup$
2
  • 1
    $\begingroup$ It is not clear to me that the Myhill-Nerode theorem buys you anything if the language is given to you in some strange way. In fact if the languages are given to you by specifying Turing machines recognizing them, then it's undecidable whether or not a language is regular (and also whether or not it is context-free) by Rice's theorem. So I think you should be more precise about how you are given these languages. $\endgroup$ Commented Mar 9, 2012 at 21:07
  • $\begingroup$ @QiaochuYuan- That's true, but by the same token the pumping lemma isn't all that useful either, since if the language is described as a TM you can't use the pumping lemma on it. I was figuring that the languages were presented in some form of set-builder notation, if that's reasonable. $\endgroup$ Commented Mar 9, 2012 at 21:22

4 Answers 4

6
$\begingroup$

This is probably not really what you are looking for, but it's the best I know. It's a strengthening of the fairly well-known Parikh's Theorem.

There is a characterisation of the bounded context-free languages. A language $L$ is bounded if $L\subseteq w_1^* w_2^* \ldots w_n^*$ for some fixed words $w_1,\ldots,w_n$, in which case we can define a corresponding subset of $\mathbb{N}_0^n$:

$\Phi(L) = \{(m_1,m_2,\ldots,m_n) \mid w_1^{m_1} w_2^{m_2} \ldots w_n^{m_n}\in L\}.$

By a theorem of Ginsburg and Spanier (Thm 5.4.2 in Ginsburg's book 'The Mathematical Theory of Context-free Languages'), a bounded language $L$ is context-free if and only if $\Phi(L)$ can be expressed as a finite union of linear sets, each with a stratified set of periods. For the definitions of the terms in the last sentence, see my MO question.

This characterisation can be very useful for proving (not necessarily bounded) languages not to be context-free. If we can find words $w_1,\ldots,w_n$ such that $L\cap w_1^*\ldots w_n^*$ is not context-free, then $L$ is not context-free either, since $w_1^*\ldots w_n^*$ is a regular language, and the intersection of a context-free language with a regular language is context-free.

$\endgroup$
5
$\begingroup$

Myhill-Nerode theorem is very closely related to the following theorem: language $L$ is regular if and only if its syntactic monoid is finite. This also gives you precise characterization of regular languages. Moreover, majority of definitions regarding regular languages can be nicely translated to group theory.

To answer you question, the mentioned theorem can be generalized a bit, i.e. a group is context-free if and only if it contains a finitely generated free subgroup of finite index (compare with this article). Unfortunately this doesn't make anything easier, still it is a characterization of context-free languages that are word problem for some group (to be clear those are not the all context-free languages).

Hope this helps ;-)

Edit: fixed broken link.

Edit 2: made clear that those are not the only context-free languages. Loony me, it must have been a bad digestion (and the phase of the moon too, naturally), that made me write the previous version, sorry for that!

$\endgroup$
8
  • 2
    $\begingroup$ Interesting, but a context-free group is not the same as a context-free language. In fact, there's quite a bit of group-theoretic mathematics needed to link the two concepts, as explained in the paper you referenced. Furthermore, the syntactic monoid per se is not suited to the study of context-free languages, nor to many if any infinite-state automata. For example, a language and its complement have the same syntactic monoid, but the complement of a CFL is not necessarily a CFL. In fact, you can get any RE language from CFLs if you allow complement. $\endgroup$ Commented Mar 10, 2012 at 20:02
  • $\begingroup$ @DavidLewis "context-free group is not the same as a context-free language" For that precise reason I linked the article. "Furthermore, the syntactic monoid per se is not suited to the study of context-free languages" I didn't say that plain syntactic monoids are suited to study CFLs, I just pointed out some similarities. "nor to many if any infinite-state automata" that is not quite true, compare article I recently saw. $\endgroup$
    – dtldarek
    Commented Mar 10, 2012 at 22:38
  • $\begingroup$ Interesting paper. I just read the first few pages so far, but it looks like it's about languages over infinite alphabets rather than those defined by infinite-state automata models (like CFLs/PDAs, as the original question asked). And the crucial reduction by orbits and renaming renders the resulting infinite monoids finitary to get an analog of the finite-state theory. To deal with true infinite-state languages and automata, we'll need algebraic correlates that respect rather than finesse or collapse the non-finiteness. Some have tried topological methods, for example. $\endgroup$ Commented Mar 11, 2012 at 2:42
  • $\begingroup$ @DavidLewis I have no idea what "true infinite-state" means, but I agree :-P $\endgroup$
    – dtldarek
    Commented Mar 12, 2012 at 19:38
  • 2
    $\begingroup$ @dtldarek: I think you should really change the bit of your answer where you say "still it is a precise characterization of context-free languages". It is only a characterization of CFLs that are word problems of groups, which are very far from being all CFLs. $\endgroup$
    – Tara B
    Commented Mar 13, 2012 at 12:21
3
$\begingroup$

If you interpret the Myhill-Nerode theorem as being about the algebraic structure of the corresponding family of automata and the monoid associated with that family, then there is an interesting necessary and sufficient generalization to context-free languages. One version of this is here -- http://arxiv.org/abs/math.GR/0601061. It is, in effect, that a language $L$ is context free iff it is accepted by a polycyclic monoid automaton iff it is accepted by a free group automaton.

For both these kinds of automata, the monoid or group acts as secondary storage attached to a non-deterministic finite-state control mechanism. Transitions in secondary storage are accomplished by right multiplication in the monoid or group, and the automaton starts with the identity 1 in the store and accepts only if it ends in an accepting state with 1 again in the store. The finite-state control cannot directly "read" the monoid/group store, but it essentially blocks based on what happens in the store, and that, combined with the non-determinism in the finite-state control, confers a lot of computational power on the combination.

The complete state set for the resulting automaton is thus $Q \times M$ where $Q$ is the state set of the finite state control and $M$ is the monoid (or group). For regular languages, the state set is just $Q$, which is finite, or $Q \times M$ where $M$ is just $\{1\}$ or finite. For CFLs $M$ has the structure stated above. So the generalization is about the relationship between the family of languages and the structure of $M$.

To finish the story, a polycyclic monoid is strings over a finite set of symbols $X^' = \{L_x, R_x: x \in X \}$ for some finite set of symbols $X$. The size of $X$ is the rank of the polycyclic monoid. Multiplication in this monoid is given by: $L_x R_x = 1$, $x \neq y \Rightarrow L_x R_y = 0 $, anything else just freely concatenates the right-hand symbol. Think of $L_x$ a pushing x onto a stack (right hand end of a string), $R_x$ as popping x. Trying to pop a symbol that isn't on top blocks the computation. So a polycyclic monoid is mimicking a pushdown store with k symbols, and it's not hard to see that automata using it as a store define the context-free languages.

That makes it seem that we've just trivially algebraicized the intuitive notion of a pushdown store, and indeed we have. But a polycyclic monoid is definable in a number of other ways, for example as the free group on k symbols, with $R_x$ acting as $L_x^{-1}$, and then things get more interesting -- Context-free-ness and group-ness are deeply related, as dtldarek points out, and the finiteness of the whole automaton for regular languages finds an analogy in the finite index of the free group.

$\endgroup$
3
  • $\begingroup$ @DavidLevis This article from Mark Kambites is nice, I haven't seen it before. Still, I find this result less intriguing than the previous one--as you have written, it is an algebraization of "pushdown store". $\endgroup$
    – dtldarek
    Commented Mar 10, 2012 at 23:03
  • $\begingroup$ @dtldarek -- They have different "flavors" -- group-theoretic vs automata/language, and despite the simplicity of this particular construction, I think there is deep stuff lurking. The issue of the syntactic monoid not working well beyond finite state, however, is a real one that lots of folks have been struggling with for years. Too bad Eilenberg never got his ideas out there. Did you see my query on that (bit.ly/yCbwEO})? $\endgroup$ Commented Mar 11, 2012 at 2:14
  • $\begingroup$ @DavidLevis Yeah, I have checked it out. I happily upvoted it, but as for now that's the only thing I can do, besides, that's probably a lost case (not the reason why he didn't do that though). $\endgroup$
    – dtldarek
    Commented Mar 12, 2012 at 19:44
3
$\begingroup$

I posted this as an answer to this question on CS, but I think it is also relevant for this one.

There is a very nice characterization of context-free languages (credited to Wechler) in the paper by Berstel and Boasson, Towards an algebraic theory of context-free languages. Let me introduce a few definitions in order to state this result (Theorem 3.1 in the paper).

The polynomial closure $\operatorname{Pol}(\mathcal{L})$ of a class $\mathcal{L}$ of languages of $A^*$ is the set of all languages that are finite unions of products of the form $L_0a_1L_1 \dotsm a_nL_n$, where $L_0, ..., L_n \in \mathcal{L}$ and $a_1, ..., a_n \in A$.

An algebra is a class $\mathcal{A}$ of languages containing the language $\{1\}$ and such that $\mathcal{A} = \operatorname{Pol}(\mathcal{A})$. It is finitely generated if $\mathcal{A} = \operatorname{Pol}(\mathcal{F})$ for some finite set $\mathcal{F}$ of languages. It is stable if, for each $L \in \mathcal{A}$ and $u \in A^*$, the language $u^{-1}L = \{v \mid uv \in L\}$ also belongs to $\mathcal{A}$. Note that it suffices to have $a^{-1}L \in \mathcal{A}$ for all $a \in A$.

Theorem. A language of $A^*$ is context-free if and only if it belongs to a finitely generated stable algebra.

See the article for illuminating examples and many nice consequences.

$\endgroup$

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .