Word splitting only applies to unquoted expansions (parameter expansion, arithmetic expansion and command substitution) in modern Bourne-like shells (in zsh
, only command substitution unless you use an emulation mode).
When you do:
args a :b
Word splitting is not involved at all.
It's the shell parsing that tokenises those, finds the first one is not one of its keywords and so it's a simple command with 3 arguments: args
, a
and :b
. The amount of space won't make any difference there. Note that it's not only spaces, also tabs, and in some shells (like yash
or bash
) any character considered as blank in you locale (though in the case of bash
, not the multibyte ones)¹.
Even in the Bourne shell where word splitting also applied to unquoted arguments of commands regardless of whether they were the result of expansions or not, that would be done on top (long after) the tokenising and syntax parsing.
In the Bourne shell, in
IFS=i
while bib=did edit foo
That would not parse that as:
"wh" "le b" "b=d" "d ed" "t foo"
But first as a while
with a simple command and the edit
word (as it's an argument but not the bid=did
word which is an assignment) of that simple command would be further split into ed
and t
so that the ed
command with the 3 arguments ed
, t
and foo
would be run as the condition of that while
loop.
Word splitting is not part of the syntax parsing. It's like an operator that is applied implicitly to arguments (also in for
loop words, arrays and with some shell the target of redirections and a few other contexts) for the parts of them that are not quoted. What's confusing is that it's done implicitly. You don't do cmd split($x)
, you do cmd $x
and the split()
(actually glob(split())
) is implied. In zsh
, you have to request it explicitly for parameter expansions (split($x)
is $=x
there ($=
looking like a pair of scissors)).
So, now, for your examples:
args(){ echo ["$*"];}
args a :b # three spaces
# [a::b]
a
and :b
arguments of args
joined with the first character of $IFS
which gives a::b
(note that it's a bad idea of using [...]
here as it's a globbing operator).
args(){ echo [$*];}
args a :b # three spaces
# [a b] # two spaces
$*
(which contains a::b
) is split into a
, the empty string and b
. So it's:
echo '[a' '' 'b]'
args(){ echo ["$1"]["$2"]; }
args a :b # three spaces
# [a][:b]
no surprise as not word splitting.
args(){ echo [$1][$2]; }
args a :b # three spaces
# [a][ b]
That's like:
echo '[a]' '[' 'b]'
as $2
(:b
) would be split into the empty string and b
.
One case where you will see variations between implementations is when $IFS
is empty.
In:
set a b
IFS=
printf '<%s>\n' $*
In some shells (most nowadays), you see
<a>
<b>
And not <ab>
even though "$*"
would expand to ab
. Those shells still separate those a
and b
position parameters and that has now been made a POSIX requirement in the latest version of the standard.
If you did:
set a b
IFS=
var="$*" # note that the behaviour for var=$* is unspecified
printf '<%s>\n' $var
you'd see <ab>
as the information that a
and b
were 2 separate arguments was lost when assigned to $var
.
¹, of course, it's not only blanks that delimit words. Special tokens in the shell syntax do as well, the list of which depends on the context. In most contexts, |
, ||
, &
, ;
, newline, <
, >
, >>
... delimit words. In ksh93
for instance, you can write a blank-less command like:
while({([[(:)]])})&&((1||1))do(:);uname<&2|tee>(rev)file;done