A coworker claimed recently in a code review that the [[ ]]
construct is to be preferred over [ ]
...
He couldn't provide a rationale. Is there one?
Yes, speed and efficiency.
I don't see the word "speed" mentioned anywhere here, but since [[ ]]
is a Bash built-in syntax, it doesn't requires spawning a new process. [
on the other hand, is the test
command, and running it spawns a new process. So, [[ ]]
should be faster than [ ]
syntax, because it avoids spawning a new test
process every time a [
is encountered.
What makes me think [
spawns a new process and is not a Bash built-in?
Well, when I run which [
it tells me it is located at /usr/bin/[
. Note that ]
is simply the last argument to the [
command.
Also, a second reason to prefer [[ ]]
is that it is more feature-rich I think. It supports a more "C-style" and modern syntax.
I have traditionally favored [ ]
over [[ ]]
because it's more portable, but recently I think I may switch to [[ ]]
over [ ]
because it's faster, and I write a lot of Bash.
Speed test results (lower is better)
Here are my results over 2 million iterations: [[ ]]
is 1.42x faster than [ ]
:
![enter image description here](https://cdn.statically.io/img/i.sstatic.net/venqL.png)
The code being tested was super simple:
if [ "$word1" = "$word2" ]; then
echo "true"
fi
vs.
if [[ "$word1" == "$word2" ]]; then
echo "true"
fi
where word1
and word2
were just the following constants, so echo "true"
never ran:
word1="true"
word2="false"
If you want to run the test yourself, the code is below. I've embedded a rather sophisticated Python program as a heredoc string into the Bash script to do the plotting.
If you change the comparison string constants to these:
word1="truetruetruetruetruetruetruetruetruetruetruetruetruetruetruetruetru"
word2="truetruetruetruetruetruetruetruetruetruetruetruetruetruetruetruetrufalse"
...then you get these results, where the Bash built-in ([[ ]]
) is only 1.22x faster than the test
command ([ ]
):
![enter image description here](https://cdn.statically.io/img/i.sstatic.net/15cO3.png)
In the first case, the strings differ after only 1 char, so the comparison can end immediately. In the latter case, the strings differ after 67 chars, so the comparison takes much longer to identify that the strings differ. I suspect that the longer the strings are, the less the speed differences will be since the majority of the time difference is initially the time it takes to spawn the new [
process, but as the strings match for longer, the process spawn time matters less. That's my suspicion anyway.
speed_tests__comparison_with_test_cmd_vs_double_square_bracket_bash_builtin.sh
from my eRCaGuy_hello_world repo:
#!/usr/bin/env bash
# This file is part of eRCaGuy_hello_world: https://github.com/ElectricRCAircraftGuy/eRCaGuy_hello_world
# ==============================================================================
# Python plotting program
# - is a Bash heredoc
# References:
# 1. My `plot_data()` function here:
# https://github.com/ElectricRCAircraftGuy/eRCaGuy_hello_world/blob/master/python/pandas_dataframe_iteration_vs_vectorization_vs_list_comprehension_speed_tests.py
# 1. See my answer here: https://stackoverflow.com/a/77270285/4561887
# ==============================================================================
python_plotting_program=$(cat <<'PROGRAM_END'
# 3rd-party imports
import matplotlib.pyplot as plt
import pandas as pd
# standard imports
import os
import sys
assert sys.argv[0] == "-c"
# print(f"sys.argv = {sys.argv}") # debugging
# Get the command-line arguments
FULL_PATH_TO_SCRIPT = sys.argv[1]
NUM_ITERATIONS = int(sys.argv[2])
single_bracket_sec = float(sys.argv[3])
double_bracket_sec = float(sys.argv[4])
# Obtain paths to help save the plot later.
# See my answer: https://stackoverflow.com/a/74800814/4561887
SCRIPT_DIRECTORY = str(os.path.dirname(FULL_PATH_TO_SCRIPT))
FILENAME = str(os.path.basename(FULL_PATH_TO_SCRIPT))
FILENAME_NO_EXTENSION = os.path.splitext(FILENAME)[0]
# place into lists
labels = ['`[ ]` `test` func', '`[[ ]]` Bash built-in']
data = [single_bracket_sec, double_bracket_sec]
# place into a Pandas dataframe for easy manipulation and plotting
df = pd.DataFrame({'test_type': labels, 'time_sec': data})
df = df.sort_values(by="time_sec", axis='rows', ascending=False)
df = df.reset_index(drop=True)
# plot the data
fig = plt.figure()
plt.bar(labels, data)
plt.title(f"Speed Test: `[ ]` vs `[[ ]]` over {NUM_ITERATIONS:,} iterations")
plt.xlabel('Test Type', labelpad=8) # use `labelpad` to lower the label
plt.ylabel('Time (sec)')
# Prepare to add text labels to each bar
df["text_x"] = df.index # use the indices as the x-positions
df["text_y"] = df["time_sec"] + 0.06*df["time_sec"].max()
df["time_multiplier"] = df["time_sec"] / df["time_sec"].min()
df["text_label"] = (df["time_sec"].map("{:.4f} sec\n".format) +
df["time_multiplier"].map("{:.2f}x".format))
# Use a list comprehension to actually call `plt.text()` to **automatically add
# a plot label** for each row in the dataframe
[
plt.text(
text_x,
text_y,
text_label,
horizontalalignment='center',
verticalalignment='center'
) for text_x, text_y, text_label
in zip(
df["text_x"],
df["text_y"],
df["text_label"]
)
]
# add 10% to the top of the y-axis to leave space for labels
ymin, ymax = plt.ylim()
plt.ylim(ymin, ymax*1.1)
plt.savefig(f"{SCRIPT_DIRECTORY}/{FILENAME_NO_EXTENSION}.svg")
plt.savefig(f"{SCRIPT_DIRECTORY}/{FILENAME_NO_EXTENSION}.png")
plt.show()
PROGRAM_END
)
# ==============================================================================
# Bash speed test program
# ==============================================================================
# See my answer: https://stackoverflow.com/a/60157372/4561887
FULL_PATH_TO_SCRIPT="$(realpath "${BASH_SOURCE[-1]}")"
NUM_ITERATIONS="2000000" # 2 million
# NUM_ITERATIONS="1000000" # 1 million
# NUM_ITERATIONS="10000" # 10k
word1="true"
word2="false"
# Get an absolute timestamp in floating point seconds.
# From:
# https://github.com/ElectricRCAircraftGuy/eRCaGuy_hello_world/blob/master/bash/timestamp_lib_WIP.sh
seconds_float() {
time_sec="$(date +"%s.%N")"
echo "$time_sec"
}
single_bracket() {
for i in $(seq 1 "$NUM_ITERATIONS"); do
if [ "$word1" = "$word2" ]; then
echo "true"
fi
done
}
double_bracket() {
for i in $(seq 1 "$NUM_ITERATIONS"); do
if [[ "$word1" == "$word2" ]]; then
echo "true"
fi
done
}
run_and_time_function() {
# the 1st arg is the function to run
func_to_time="$1"
# NB: the "information" type prints will go to stderr so they don't
# interfere with the actual timing results printed to stdout.
echo -e "== $func_to_time time test start... ==" >&2 # to stderr
time_start="$(seconds_float)"
$func_to_time
time_end="$(seconds_float)"
elapsed_time="$(bc <<< "scale=20; $time_end - $time_start")"
echo "== $func_to_time time test end. ==" >&2 # to stderr
echo "$elapsed_time" # to stdout
}
main() {
echo "Running speed tests over $NUM_ITERATIONS iterations."
single_bracket_time_sec="$(run_and_time_function "single_bracket")"
double_bracket_time_sec="$(run_and_time_function "double_bracket")"
echo "single_bracket_time_sec = $single_bracket_time_sec"
echo "double_bracket_time_sec = $double_bracket_time_sec"
# echo "Plotting the results in Python..."
python3 -c "$python_plotting_program" \
"$FULL_PATH_TO_SCRIPT" \
"$NUM_ITERATIONS" \
"$single_bracket_time_sec" \
"$double_bracket_time_sec"
}
# Determine if the script is being sourced or executed (run).
# See:
# 1. "eRCaGuy_hello_world/bash/if__name__==__main___check_if_sourced_or_executed_best.sh"
# 1. My answer: https://stackoverflow.com/a/70662116/4561887
if [ "${BASH_SOURCE[0]}" = "$0" ]; then
# This script is being run.
__name__="__main__"
else
# This script is being sourced.
__name__="__source__"
fi
# Only run `main` if this script is being **run**, NOT sourced (imported).
# - See my answer: https://stackoverflow.com/a/70662116/4561887
if [ "$__name__" = "__main__" ]; then
main "$@"
fi
Sample run and output:
eRCaGuy_hello_world$ bash/speed_tests__comparison_with_test_cmd_vs_double_square_bracket_bash_builtin.sh
Running speed tests over 2000000 iterations.
== single_bracket time test start... ==
== single_bracket time test end. ==
== double_bracket time test start... ==
== double_bracket time test end. ==
single_bracket_time_sec = 5.990248014
double_bracket_time_sec = 4.230342635
References
- My
plot_data()
function here, for how to make the bar plot with the sophisticated text above the bars: https://github.com/ElectricRCAircraftGuy/eRCaGuy_hello_world/blob/master/python/pandas_dataframe_iteration_vs_vectorization_vs_list_comprehension_speed_tests.py
- My answer with this code: How to iterate over rows in a DataFrame in Pandas
- My answer: How do I get the path and name of the python file that is currently executing?
- My answer: How to obtain the full file path, full directory, and base filename of any script being run OR sourced...
- My Bash library: get an absolute timestamp in floating point seconds: https://github.com/ElectricRCAircraftGuy/eRCaGuy_hello_world/blob/master/bash/timestamp_lib_WIP.sh
- How to make a heredoc: https://linuxize.com/post/bash-heredoc/
See also
- This answer which links to another super simple speed test
"$(id -nu)" = "$someuser"
in your code above. Unlike [[...]], $(...) substitution is part of POSIX, and only the very oldest shells don't understand it; it avoids some surprising nesting/quoting behavior that you run into with backticks.