Gold's theorem provides pretty convincing mathematical evidence supporting the universal grammar hypothesis in linguistics. This hypothesis is two-fold: (1) children are not presented logically with enough information to actually learn their native language; (2) hence there exists a universal grammar which is encoded somehow in the human brain and which facilitates the logical gap between the positive data given to the child and the data necessary to determine the language's grammar. While the universal grammar hypothesis isn't universally accepted, it has been one of the most important ideas in linguistics so far.
Gold's theorem shows that certain classes of languages are logically not learnable. Of course, it operates in a purely formal setting. I'll provide up this setting now following the definitions and notations of Gabriel Carroll, pg. 41.
Start with a finite alphabet $\Sigma$ and let $\Sigma^*$ designate the set of finite sequences of elements of $\Sigma$. A language $L$ is a subset of $\Sigma^*$. A text of $L$ is an infinite string $w_1, w_2, \dots$ of elements of $L$ such that every element of $L$ occurs at least once. A learner for a class $\mathcal{L}$ of languages is a function $\Lambda : (\Sigma^*)^* \rightarrow \mathcal{L}$ that intuitively takes a sequence of strings of $\Sigma$ and guesses the language in $\mathcal{L}$ in which all these strings are grammatically correct. The learner $\Lambda$ learns the language $L \in \mathcal{L}$ if for every text $w_1,w_2,\dots$ of $L$ there exists a natural number $N$ such that $\Lambda(w_1,w_2,\dots,w_n) = L$ for $n \geq N$. The learner $\Lambda$ learns the class $\mathcal{L}$ if it learns each language in $\mathcal{L}$, and the class $\mathcal{L}$ is learnable if there exists a learner which learns it.
This is Gold's theorem, first proved by Gold in his seminal paper (but my wording is taken from Carroll):
- If the class $\mathcal{L}$ contains all finite languages and at least one infinite language, then $\mathcal{L}$ is not learnable.
In particular, any finite language is regular. Hence the class of regular languages is unlearnable, and it follows at once that every class of the Chomsky Hierarchy is unlearnable.
The proof of Gold's theorem is, as Carroll shows, not very hard, although certainly not intuitive, and it can be reduced to a corollary of the following characterization of learnable classes of languages (Carroll, Lemma 9):
- A countable class $\mathcal{L}$ of nonempty languages is learnable if and only if, for each $L \in \mathcal{L}$, there is a finite ''telltale'' subset $T \subseteq L$ such that $L$ is minimal in $\{L' \in \mathcal{L} : T \subseteq L'\}$.