What is the rationale behind $array not expanding the whole array in ksh and bash?

Question

bash$ a=(1 2 3)
bash$ echo $a
1

but

zsh% a=(1 2 3)
zsh% echo $a
1 2 3
zsh% printf '%s\n' $a
1
2
3

(the last part demonstrates that the array expands into separate arguments, equivalent to "${a[@]}" and not "${a[*]}")

The bash behavior (which matches ksh) is extremely counter-intuitive. How is "first element only" a reasonable reaction to "expand this array variable"?

In other areas where zsh is divergent, it's because ksh and bash are sticking closer to the original Bourne shell. But Bourne had no user-defined array variables.

Why did bash make this strange decision? If it was copying ksh, why did ksh make this strange decision?

Continuing after a long string of comments:

This should not be a question of criticizing or praising zsh. zsh is nothing but an easily accessible example of how things might have been done differently.

One of the possibilities to explain a design decision is backward compatibility. And backward compatibility isn't an opinion. It's an objective fact.

If you can show a script (a full script, not an out-of-context excerpt) that behaves in a known way in Bourne shell (i.e. does not just bomb with a syntax error), and behaves different in the hypothetical "Korn shell with full $array expansion", then you win! It's a backward compatibility issue.

No such script has been given. This isn't one:

a=(1 2 3)
printf '%s\n' $a

because it's a syntax error in Bourne shell. Giving new meaning to something that used to be a syntax error is a way to create new features while keeping backward compatibility.

As far as I can tell, the fact that a=(...) was originally a syntax error creates a clean separation between scripts that (attempt to) use arrays and those that don't. In the first category, backward compatibility can't be invoked as a reason for anything, because those scripts wouldn't run in the old shell anyway. In the second category, backward compatibility holds regardless of your array variable expansion rules, because there are no arrays to expand!

This is not a proof since I'm relying partially on intuition to decide that there's no way to smuggle an array into a script without =( and therefore no script exists that would exhibit incompatible behavior. The nice thing about a claim of nonexistence is that you only have to show one counterexample to end it.

The a=$@ thing that was brought up in the comments does look like it could contribute to an explanation. If it creates an array variable a then this script:

a=$@
printf '%s\n' $a

should show the difference. In my tests, though, that doesn't happen. All shells (heirloom sh, modern ksh, bash, and zsh) seem to handle the first line the same way. a is not an array, it's just a string with spaces in it. (zsh diverges on the second line because it doesn't do word-splitting on the value of $a, but that has nothing to do with array variables)

Bash copied ksh. (Ksh88 already had arrays, bash only got them with bash 2.0 in 1996.) — Gilles 'SO- stop being evil', Commented Jul 30, 2017 at 23:19
If a variable a could contain an array in zsh, why in a=$@, $a is one string? — user232326, Commented Jul 31, 2017 at 4:19
That seems like an unrelated question... the answer is that whether an assignment creates an array depends on whether there is a left parenthesis immediately after the = sign. a=($@) copies the array as an array. — user41515, Commented Jul 31, 2017 at 4:27
So, and array: a=(1 2 3); printf '%s\n' $a could not be copied with 'b=$a', isn't that odd ? In yash, the array gets copied with b=$a. — user232326, Commented Jul 31, 2017 at 5:21
It's not "first element only", it's ${array[0]} which may not exist, especially with associative arrays. — xhienne, Commented Jul 31, 2017 at 17:29

Stéphane Chazelas · Accepted Answer · 2021-08-04 07:11:07Z

I can't give an answer but suggest some possible explanations.

It's true that except for ksh and its clones (pdksh and further derivatives and bash), all other shells with arrays (csh, tcsh, rc, es, akanga, fish, zsh, yash) have $array expand to all the members of the array.

But in both yash and zsh (when in sh emulation), the two Bourne-like shells in that list, that expansion is still subject to split+glob (and still empty removal even in zsh even when not in sh emulation), so you still need to use the awkward "${array[@]}" syntax (or "${(@)array}" or "$array[@]" in zsh which are hardly easier to type¹) to preserve the list (csh and tcsh have similar issues). That split+glob and empty removal is the Bourne heritage (itself to some extent caused by the Thompson shell heritage where $1 was more like macro expansion).

rc and fish are two examples of later shells that don't have the Bourne baggage and with a cleaner approach. They acknowledge the fact that a shell being a command line interpreter, the primary things they deal with is lists (the list of arguments to commands), so list/array is the primary data type (there's only one type and it's lists in rc) and got rid of the split+glob-upon-expansion bug/misfeature of the Bourne shell (which is no longer needed now that the primary type is array).

Still, that doesn't explain why David Korn chose not to have $array expand to all elements but to the element of index 0.

Now, apart from csh/tcsh, all those shells are much newer than ksh, developed in the early 80s only a few years after the Bourne shell and Unix V7 were released. Unix V7 was the one that also introduced the environ. That was the fancy new thing at time. The environment is neat and useful but environment variables can't contain arrays unless you use some form of encoding.

That's only conjecture but I suspect one reason for David Korn to choose that approach was so that the interface with the environment was not modified.

In ksh88, like rc, all variables were arrays (sparse though; a bit like associative arrays with keys limited to positive integers which is another oddity compared to other shells or programming languages, and you could tell it hadn't been completely thought through as it was impossible for instance to retrieve the list of keys). In that new design, var=value became short for var[0]=value. You could still export all your variables, but export var exports the element of index 0 of the array to the environment.

rc does put all its variables in the environment, fish supports exporting arrays, but to do that for arrays with more than one element, (at least for the port to Unix for rc which comes from plan9), they have to resort to some form of encoding which is only understood by them.

csh, tcsh, zsh don't support exporting arrays (though nowadays that may not sound like a big limitation). You can export arrays in yash, but they're exported as an environment variable that's the array elements joined with : (so (a "" "" b) and (a : b) are exported to the same value) and there's no converting back to array on importing.

Another possible justification might have been the consistency with Bourne's $@/$* (but then why have array indices start at 0 instead of 1 (another oddity compared to other shells/languages of the time)?). ksh was not free software, it was a commercial enterprise, one of the requirements was Bourne compatibility. ksh did remove the field splitting done on every non-quoted word in list context (as that was clearly not useful in the Bourne shell) but had to keep it for expansions (as scripts did use things like var="file1 file2"; cmd $var as the Bourne shell had no array but "$@"). Keeping that in a shell that otherwise has arrays makes little sense, but Korn had little other option if Ksh was to still be able to interpret scripts of the consumer base. If $scalar was subject to split+glob, $array would have to be for consistency, and so "${array[@]}" as a generalisation of "$@" made some sense. zsh had no similar constraint so was free to remove the split+glob upon expansions at the same time as adding arrays (but paid a price for breaking Bourne backward compatible).

Another explanation as offered by @Arrow might have been that he didn't want to overload the existing operators to make them behave differently for different types of variables (for instance ${#var} vs ${#array} though the Bourne shell didn't have that one or ${var-value}, ${var#pattern}) which can cause confusion for users (in zsh it's not always obvious how some operators work with array vs scalar).

Some related reading:

https://www.usenix.org/legacy/publications/library/proceedings/vhll/full_papers/korn.ksh.a where David Korn explains the development of ksh (arrays first added as patches over the Bourne shell for a form entry system).
https://news.slashdot.org/story/01/02/06/2030205/david-korn-tells-all

As to the a=$@ case in your edit, that's actually one case where ksh broke compatibility with the Bourne shell.

In the Bourne shell, $@ and $* contained the concatenation of the positional parameters with space characters. Only $@ when quoted was special as it expanded to the same as "$*" but with the inserted spaces not quoted (with special cases for the empty list in the newer versions where it has been addressed like on Solaris). You'll notice that if you remove space from $IFS, "$@" expands to just one argument in list contexts (0 for an empty list in the fixed versions mentioned above). When not quoted, $* and $@ behave like any other variable (split upon characters of $IFS, not necessarily on the original positional parameters). For instance, in the Bourne shell:

'set' 'a:b'   'c'
IFS=:
printf '<%s>\n' $@
printf '[%s]\n' "$@"

Would output:

<a>
<b c>
[a:b c]

Ksh88 changed that so that $@ and $* were joined with the first character of $IFS. "$@" in list context separates the positional parameters except when $IFS is empty.

When $IFS is empty, $* are joined on space, except for $* when quoted which is joined on with no separator.

Examples:

$ set a b
$ IFS=:
$ a=$@ b=$* c="$@" d="$*"
$ printf '<%s>\n' "$a" "$b" "$c" "$d" $@ $* "$@" "$*"
<a:b>
<a:b>
<a:b>
<a:b>
<a>
<b>
<a>
<b>
<a>
<b>
<a:b>
$ IFS=
$ a=$@ b=$* c="$@" d="$*"
$ printf '<%s>\n' "$a" "$b" "$c" "$d" $@ $* "$@" "$*"
<a b>
<a b>
<a b>
<ab>
<a b>
<a b>
<a b>
<ab>

You'll see a lot of variations in the different Bourne/Korn-like shells including ksh93 vs ksh88. There are also some variations in cases like:

set --
cmd ''"$@"
cmd $empty"$@"

Or when $IFS contains multi-byte characters, or bytes not forming valid characters.

^{¹ in yash however, "$array" behave like "${array[@]}" while zsh's "$array" behaves like "${array[*]}". Less ugly / awkward though maybe more surprising.}

Note that ksh, pdksh and further derivatives and bash mean more than 90% of scripts with arrays. — user232326, Commented Aug 1, 2017 at 5:30
@Arrow, and until the mid 2000s, 99% of software were written for Microsoft Windows, arguably the OS with the worst design ever written. That shouldn't stop us for looking for better designs. — Stéphane Chazelas, Commented Aug 1, 2017 at 7:20
@Arrow, yes backward compatibility is indeed the top thing holding back technical progress in IT more than anywhere else. The day Windows stops being backward compatible, people will take the opportunity to try something else. — Stéphane Chazelas, Commented Aug 1, 2017 at 7:24
Wether you like or not windows is absolutelly irrelevant here. Up to mid 2000s (as now) 99% of the software for the fastest computers was written for Unix (well, read Unix and mostly Linux now). Microsoft simply was irrelevant. — user232326, Commented Aug 1, 2017 at 7:35

score 1 · Accepted Answer · 2017-07-31 05:29:11Z

From your comments to this answer, you are expecting that an answer would say and accept that the paradigm of zsh "is better" and therefore is the way in which all shells should be working. "Better" is just an opinion. An opinion require no discussion.

Maybe what you expect is the detail of why $a could also be used for an array.

That is what zsh is trying to do, and even if it has done a lot of work in that direction,
it still fails in some instances. It does not copy arrays (yet).
```
$ a=(1 2 3)
$ b=$a
$ printf '<%s>' $a ' ' $b; echo
<1><2><3>< ><1 2 3>
```

But this question is really very far from that. This is a problem of language.

In our language we use a name for each idea. Each new idea that is radically different than previous ideas must have its own name. It is natural in our language that we accept new words for new ideas. Think about the name "Internet", a new name for a new idea. Using old names for new concepts always leads to confusion and misunderstandings. That is how we humans are built, and sounds reasonable when we come to think about it.

In shell (and any programming language) we use some specific syntax for each specific idea.

Since the start, a variable in a shell used a name (assume a) and its value was the result from the expansion (a shell procedure) of the symbols $a. A variable was able to contain an string or a number (automatically converted).

The introduction of a new content for a variable (an array of values) must use a new syntax.

That is exactly what Perl did with using @a to denote a list, while keeping $a for an scalar.
That is: The above two are called SCALAR and LIST context. ( In Perl ).

That is why we have a a=(1 2) to assign an array (there are also other ways but this is the most common). A syntax that is invalid in previous shells.

In sh (dash in this case):

$ a=(12)
sh: 2: Syntax error: "(" unexpected

And the expansion of the variable was expanded from $a or the equivalent ${a} to the new syntax (and invalid in sh) `${a[@]}:

$ a=(aa bb cc)
$ printf '%s\n' "${a[@]}"
aa
bb
cc

In simpler shells (ash in this case):

$ a=Strin-Of-Text
$ printf '%s\n' "$a"
ash: syntax error: bad substitution

It is unfortunate that such convoluted way of writing an array was chosen.

If I were to propose one new syntax, I'll probably would follow Perl lead and use something similar to "@a" for a list of values (or maybe #a or %a). That should be matter of some discussion to reach some consensus.

But that is not what was done (sadly), what was chosen was: ${a[@]}.

In short: For backwards compatibility, it is expected that the expansion of a simple variable $a results in only one value, not a list of values, furthermore, separated values.

As there are no arrays defined in POSIX, it may be in-effective to quote that in POSIX the definition of Parameter Expansion say:

The value, if any, of parameter shall be substituted.

And, well, actually, most shells print only an scalar with $a

bash(4.4)       : <1><====><1><2><3>
lksh            : <1><====><1><2><3>
mksh            : <1><====><1><2><3>
ksh93           : <1><====><1><2><3>
astsh           : <1><====><1><2><3>
zsh/ksh         : <1><====><1><2><3>
zsh             : <1><2><3><====><1><2><3>

So, for writting shell scripts, zsh^[a] is the odd one.

Tested with this script:

a=(1 2 3)
printf '<%s>' $a '====';
printf '<%s>' "${a[@]}" ;
echo

^[a] Also yash. csh compatible shells not tested. Scripts with csh are a known problem.

There are many design flaws in Bourne shell that are fixed in zsh (in its default mode) but maintained for backward compatibility in other shells. This is not an explanation for array expansion since arrays didn't exist in Bourne shell. As soon as you have an array assignment, you're outside the realm of Bourne compatibility. I don't find this a believable motivation. — user41515, Commented Jul 31, 2017 at 1:01
I did say: "in-effective quote". There are other elements in the answer. @WumpusQ.Wumbley — user232326, Commented Jul 31, 2017 at 1:04
A good answer here doesn't require any comparison to zsh. It requires a demonstration of a problem that would have occurred in compatibility between Bourne and ksh, if ksh had adopted the "expand whole array" approach from the beginning. The inclusion of zsh in my question only serves to demonstrate that the approach is feasible (and that at least one person other than me thought it was a good enough idea to implement!) — user41515, Commented Jul 31, 2017 at 1:12
I don't understand the backward compatibility claim. At the time arrays were created, any use of them at all would be incompatible with all previous shells. How then would any backward compatibility come into consideration? — user41515, Commented Jul 31, 2017 at 1:21
But only if you used an array in your script, which makes it incompatible with shells predating the invention of array variables anyway! — user41515, Commented Jul 31, 2017 at 1:22

Stack Exchange Network

What is the rationale behind $array not expanding the whole array in ksh and bash?

2 Answers 2

You must log in to answer this question.

Linked

Hot Network Questions

What is the rationale behind $array not expanding the whole array in ksh and bash?

2 Answers 2

You must log in to answer this question.

Linked

Related

Hot Network Questions