10
$\begingroup$

In many languages, the authors are hesitant to add more keywords, because doing so would break any existing code that happens to use those keywords as identifiers. However, they could do what C is doing for bool true and false by building them in but not reserving them as keywords. So if someone has an existing variable named any of those the code will still compile.

Are there any disadvantages to this practice that avoids breaking legacy codebases?

$\endgroup$
5
  • 3
    $\begingroup$ Strictly speaking they don't have to. I've seen a code snippet something like if if = then then then = else else else = if floating around in the comments here. I'll let someone more familiar with those languages write a full answer. $\endgroup$
    – Bbrk24
    Commented Jul 7, 2023 at 0:21
  • 1
    $\begingroup$ @Bbrk24 that would be the Fortran classic: langdev.stackexchange.com/questions/1978/… $\endgroup$
    – detly
    Commented Jul 8, 2023 at 13:12
  • $\begingroup$ Actually, C only defines those identifiers in <stdbool.h>. The actual keywords were created using names previously reserved for future use (beginning with _). $\endgroup$ Commented Jul 8, 2023 at 15:22
  • 2
    $\begingroup$ @TobySpeight ...until C23. Now, bool true false are keywords. $\endgroup$
    – CPlus
    Commented Jul 8, 2023 at 17:22
  • $\begingroup$ langdev.stackexchange.com/a/2262/1356 $\endgroup$
    – coredump
    Commented Jul 11, 2023 at 11:22

7 Answers 7

16
$\begingroup$

Why do keywords have to be reserved words?

They don't. The designers of a language must decide, very early in the process, whether keywords will also be reserved words.

Fortran for instance has gotten along quite well for two thirds of a century without requiring that its keywords also be reserved words:

Keywords are specifically NOT reserved words as per the Fortran specifications (F03 section 2.52, F90 section 2.52, and F77 section 2.2).
Keywords in Fortran

It's only when a language's designers decide to reserve specific keywords (or don't even think about it), that potential problems arise.

(One simple way of getting the best of both worlds is to provide a simple rule that defines all reserved words, past, present, and future. E.g. all identifiers that begin with "_" are reserved, and no other identifiers will ever be reserved.)

$\endgroup$
5
  • $\begingroup$ About your final note: APLs tend to do this. APL uses prefix and : for all reserved words, J uses suffix . and BQN uses prefix . $\endgroup$
    – Adám
    Commented Jul 7, 2023 at 8:17
  • $\begingroup$ SQL also has a mixed of reserved and non-reserved keywords. $\endgroup$
    – Barmar
    Commented Jul 7, 2023 at 14:02
  • $\begingroup$ It's also possible for keywords to be marked with special symbols. This is commonly done with things like assembler directives or C preprocessor directives, but could also be done with things like operators. If a C-like language had a rule that structure-access operator must not be preceded by whitespace, and operators like .ediv. or .rdiv. [Euclidian or rounding integer division] had to be preceded and followed by white space, the language could add new operator keywords without creating ambiguity. $\endgroup$
    – supercat
    Commented Jul 9, 2023 at 16:23
  • $\begingroup$ @supercat "In the Revised Report, Algol 68 is written in lower-case letters, with keywords emphasized in bold. Text files [...] don’t provide a mechanism for the representation of bold text, so keywords are typically written in upper-case [...]. The original Algol 68-R compiler [...] had a basic character set of only 64 characters. Lower-case characters were available only through a complicated system of shift prefixes, and weren’t used in programming. In Algol 68-R, keywords were distinguished from identifiers by enclosing them in single quotes, which was called ‘quote stropping’." $\endgroup$ Commented Jul 10, 2023 at 8:13
  • $\begingroup$ @supercat From accu.org/journals/overload/26/148/james_2586 $\endgroup$ Commented Jul 10, 2023 at 8:14
14
$\begingroup$

Better compiler error handling

When producing parsing errors, a compiler (or interpreter, if you're into that) which disallows keywords being used in other contexts can much more easily isolate the issue, and continue looking for other errors to report. E.g., this C/Rust/JS-hybrid pseudocode would make it very difficult for the compiler to be sure how to parse everything after the first line:

type function == Callable; // Oops! Supposed to be one '='

function function() { // function keyword is 'function'
    function function = function() || function() { function }; // recurse, or default to an anonymous function, and assign with C-style syntax to a local variable shadowing the top-level 'function' function

    function // implicit return
}

While something as ridiculous as this would never occur in practical code, it'd be much easier to recover from that small parsing issue at the start if, e.g., function was always a keyword. It's also harder to provide useful error messages when there are many different things that could have been meant by one string (e.g., both a keyword and a type, or both a keyword and an identifier, or all three as in the code above).

$\endgroup$
1
  • 5
    $\begingroup$ A related issue: keywords that aren't always used as keywords can make syntax highlighting more difficult (not impossible, but more complicated to implement). $\endgroup$
    – DLosc
    Commented Jul 7, 2023 at 17:44
7
$\begingroup$

Ambiguity

Especially in beginner languages, ambiguity can be problematic. Statements like:

false = true
if false == false

can be confusing for new users of a language, as it can be hard to tell whether the false on the left is a boolean or a variable name. Misinterpretations of the statement can lead to inaccuracy (if the false on the left is interpreted as a variable name, the if statement will produce false, but if it is interpreted as the boolean, it will produce true.)

Another case of ambiguity:

import pandas as pd
pd = pd.DataFrame(stuff)
x = pd.read_csv(more stuff)

In this case, it may be hard for a beginner to determine whether read_csv() is a function of the pd DataFrame or the pd pandas library. (I tested this, by the way, and it crashed the program.)

$\endgroup$
2
  • 3
    $\begingroup$ False = True used to be possible before Python 3 $\endgroup$ Commented Jul 7, 2023 at 10:28
  • 2
    $\begingroup$ While I see how the stuff with Pandas is related, I don't see how it's the same thing. And what do you mean it "crashed" the program? Did it hang the kernel? Did it raise an error? I tried it (replacing "more stuff" with "more_stuff"), and it said that DataFrames don't have read_csv, which is what I expected. There's no ambiguity: once you bind pd.DataFrame(stuff) to pd, pd is a DataFrame. It's no different in principle from a = [1,2] a = a[1] print(a[1]). You can bind a variable name to something contained in what it was previously bound to. There's no ambiguity as to what that does. $\endgroup$ Commented Jul 8, 2023 at 5:29
7
$\begingroup$

Ambiguity is the primary reason, as others have explained. I would like to also add that in a language without reserved keywords, it's very easy to shadow basic constructs, like if. This code works in Bash (but not when in POSIX mode):

$ alias if=echo
$ if true
true
$ if true; then echo yes; fi
bash: syntax error near unexpected token `then'
$ unalias if
$ if true; then echo yes; fi
yes

Instead of the fully reserved keywords, one way to introduce keywords without breaking old code is Contextual Keywords.

Basically you introduce keywords that can only be used in places where normal identifiers cannot. For example, C# did that with newer keywords, so did Kotlin, but JavaScript did the opposite and allowed all keywords in certain places like dot-notation and object literals (eg, both x = { if: 1 } and x.if became legal).

You can also provide a "quoting mechanism" to allow reserved words (or, depending on your use case, even illegal characters, including punctuation, spaces, and even newlines) to be used as identifiers: C#, VB.NET, F#, and Rust, and Kotlin, and SQL has "delimited identifiers", and others. This general trick is called Stropping.

$\endgroup$
2
  • 1
    $\begingroup$ Kotlin has quoting too $\endgroup$
    – Seggan
    Commented Jul 7, 2023 at 18:53
  • $\begingroup$ @Seggan, Thank you. I've just added Kotlin. I've also found that it supported "Soft Keywords" (what C# called Contextual Keywords). $\endgroup$
    – Noureddin
    Commented Jul 8, 2023 at 16:00
5
$\begingroup$

The existing answers are informative, but I feel that they might lack an overview.

Why do keywords have to be reserved words?

The right answer is that "it depends".

Summarizing the other answers, it might be because:

  1. of the way the language was designed...

    a. ...making keywords non-reserved a big syntactic problem

    // New syntax in PHP to create a closure
    $x = fn() => 'hi';
    
    // Theoretically, let's create a function called 'fn'..
    function fn(){
        return 'x';
    }
    
    // ..which causes a syntax ambiguity:
    $y = [ fn() => 'bye' ];
    // Will $y be an array with one value (a closure like $x),
    // or will it be ['x' => 'bye'] ?
    

    b. ...or not a problem at all (as per the Fortran example mentioned earlier)

  2. it might be also be an intentional design decision, to avoid ambiguities:

    let definitelyTrue = true;
    let true = false;
    
    if (definitelyTrue === true) {
       // this won't work as expected
    }
    
    // imagine the havoc of doing the same with a global variable:
    window.true = false;
    
  3. it may also drastically simplify and speed up parsing or execution. Consider the various function-like constructs in PHP; if PHP allowed redeclaration as functions, it would need to search in the symbol table on each use. Instead, it knows there can't be such a function, so it's extremely fast.

    exit(1);              // reserved language construct; very fast
    array_merge($a, $b);  // standard PHP function; quite slower (even if built-in)
    

    (this is also why it's recommended to use operators like $x[] = $y instead of functions like array_push())

  4. in some cases, it could just be a decision of the language vendor, possibly without any specific reasons or even just to follow the crowd.

  5. it could also relate to standards compliance, e.g. if the language want to comply with some specific language standard or dialect

$\endgroup$
4
$\begingroup$

There's also the option, of course, of treating newly-added keywords as not reserved words while any pre-existing keywords remain reserved, like Python did for a few versions with Async and Await. While they removed that feature in later versions after a deprecation period, it could just as easily have remained forever.

$\endgroup$
1
  • 1
    $\begingroup$ It's still the case with match in Python, and AFAIK they don't plan on reserving it ever, since match is also used as a name for functions/methods/variables in a lot of libraries/code $\endgroup$ Commented Jul 11, 2023 at 15:51
4
$\begingroup$

When writing a compiler front-end, if you reserve some words as keywords it simplifies writing the BNF (or whatever notation you use) for the language.

The point of syntax notation is to turn a stream of tokens into an abstract syntax tree which actually represents what the coder intended for the program to do. It's simpler if you know in advance which ones are variables and which ones are language keywords.

Then again, you can do a first-pass over the source code to figure out which identifiers are which, then do a second pass to create the actual AST.

This sounds like a high-cost operation, but some languages don't allow declarations outside of classes, which can be buffered, allowing the dual pass to be made on each class rather than the entire codebase, so doing two passes isn't really so awful.

So it's a tradeoff. You put less effort into your front-end by reserving keywords, but you can get away with not doing so with a little additional work.

If you take the additional effort, you can add keywords without breaking older code. Whether that's worth the trouble is a judgment call.

$\endgroup$
1
  • $\begingroup$ Note that only certain keywords would be treated as reserved to achieve this goal. Many languages include keywords which could be treated grammatically as ordinary identifiers. In Pascal, if one were to try to write Nil := myPointer;, while such an assignment could be forbidden at the parsing level (since Nil isn't an identifier), treating NIl as an identifier up until the point one tries to actually interpret it as a variable reference would also work just fine. $\endgroup$
    – supercat
    Commented Jul 9, 2023 at 16:18

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .