38

I just came across a bash script. What does [[:space:]] mean in a bash script? Why the double colon?

0

3 Answers 3

43

It is, indeed, in the bash manual, but it helps to know what you're looking for, which isn't helpful if you don't know what you're looking at. If you searched for [[ you'd get distracted by the [[ expression ]] conditional expression section. Additionally, searching for :space: lands you in two examples under the same section. You might follow the breadcrumb in that example:

For example, the following will match a line (stored in the shell variable line) if there is a sequence of characters in the value consisting of any number, including zero, of space characters, zero or one instances of ‘a’, then a ‘b’:

[[ $line =~ [[:space:]]*?(a)b ]]

... from which you could piece together that the [[:space:]] portion corresponded to "space characters", but you could be forgiven for thinking that it was only a literal space character and not a whole class of characters, which is what it represents.

If you (happen to?) search for the string " space" (that is, a space followed by the word "space") in the online bash manual, there are "only" about 32 matches to go through. About the tenth one will be here:

Within ‘[’ and ‘]’, character classes can be specified using the syntax [:class:], where class is one of the following classes defined in the POSIX standard:

alnum   alpha   ascii   blank   cntrl   digit   graph   lower
print   punct   space   upper   word    xdigit

A character class matches any character belonging to that class.

Which would then take you to the POSIX standard where you might search for the term "character class" and find

wctype, wctype_l - define character class, which gets you as far as:

The wctype() [CX] [Option Start] and wctype_l() [Option End] functions shall determine values of wctype_t according to the rules of the coded character set defined by character type information in the current locale [CX] [Option Start] or in the locale represented by locale, [Option End] respectively (category LC_CTYPE).

If you then followed the setlocale link, you'd finally get to your real answer, in the Locale section:

space

Define characters to be classified as white-space characters. In the POSIX locale, exactly <space>, <form-feed>, <newline>, <carriage-return>, <tab>, and <vertical-tab> shall be included.

In a locale definition file, no character specified for the keywords upper, lower, alpha, digit, graph, or xdigit shall be specified. The <space>, <form-feed>, <newline>, <carriage-return>, <tab>, and <vertical-tab> of the portable character set, and any characters included in the class blank are automatically included in this class.

3
  • 1
    Easier to find the manual match with LESS=+'/Within \[ and \],' man bash instead of 32 next commands :-).
    – user232326
    Commented Jul 16, 2019 at 21:30
  • 5
    @Isaac I think the point is to teach the man how to fish. That said, I did not know about less +"$cmd", so thanks for that.
    – JoL
    Commented Jul 17, 2019 at 1:09
  • 3
    Indeed, I answered given the OP's perspective; they could be forgiven for not picking up that the outer [] are independent of the inner []. I tried (!) to find a way from the question to the answer without knowing too much about what the answer was, although it took some lucky guessing :)
    – Jeff Schaller
    Commented Jul 17, 2019 at 1:56
24

It is not only for Bash, It is part of POSIX notation.

What is POSIX?

POSIX or "Portable Operating System Interface for uniX" is a collection of standards that define some of the functionality that a (UNIX) operating system should support. One of these standards defines two flavors of regular expressions.

POSIX Bracket Expressions

POSIX bracket expressions are a special kind of character classes. POSIX bracket expressions match one character out of a set of characters, just like regular character classes.

Standard POSIX

[[:alnum:]]   Alphanumeric characters
[[:alpha:]]   Alphabetic characters
[[:blank:]]   Space and tab
[[:cntrl:]]   Control characters
[[:digit:]]   Digits
[[:graph:]]   Visible characters (anything except spaces and control characters)
[[:lower:]]   Lowercase letters
[[:print:]]   Visible characters and spaces (anything except control characters)
[[:punct:]]   Punctuation (and symbols).
[[:space:]]   All whitespace characters, including line breaks
[[:upper:]]   Uppercase letters
[[:xdigit:]]  Hexadecimal digits

None Standards

[[:ascii:]]   ASCII characters
[[:word:]]    Word characters (letters, numbers and underscores)

legacy syntax (can someone find reference to these?)

[[:<:]]       Start of Word 
[[:>:]]       End of Word

You can find more info here: wiki

5
10

In regular expressions and filename globs/shell patterns, the [...] construct matches any one character of those listed within the brackets. Within those brackets, a number of named standard character character classes can be used. One of those is [:space:], which matches whitespace characters (like \s in Perl regexes). See e.g. Pattern Matching in Bash's manual

So, [[:space:]] is a part of a regular expression or pattern match, one that matches just whitespace.

E.g. a pattern match (standard shell, not Bash-specific):

case $var in 
    *[[:space:]]*) echo "'$var' contains whitespace";;
esac

or a regex (Bash):

if [[ $var =~ [[:space:]] ]]; then
    echo "'$var' contains whitespace"
fi

Note that even though bracket expressions [...] work the same in regular expressions and shell patterns, they are generally very much not the same. (case and [[ string == pattern ]] use pattern matches, [[ string =~ regex ]] uses regexes.)

Regular expressions also aren't shell-specific, they're used in e.g. awk and sed too, and are described in e.g. the Linux man page regex(7)

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .