While filtering through this json file I did a benchmark and found out utilizing jq's internal sort
and unique
method is actually 25% slower than sort --unique
!
Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
---|---|---|---|---|
jq "[.[].category] \| sort \| unique" channels.json |
172.0 ± 2.6 | 167.8 | 176.8 | 1.25 ± 0.06 |
jq "[.[].category \| select((. != null) and (. != \"XXX\"))] \| sort \| unique" channels.json |
151.9 ± 4.1 | 146.5 | 163.9 | 1.11 ± 0.06 |
jq ".[].category" channels.json \| sort -u |
137.2 ± 6.6 | 131.8 | 156.6 | 1.00 |
Summary
'jq ".[].category" channels.json | sort -u' ran
1.11 ± 0.06 times faster than 'jq "[.[].category | select((. != null) and (. != \"XXX\"))] | sort | unique" channels.json'
1.25 ± 0.06 times faster than 'jq "[.[].category] | sort | unique" channels.json'
test command:
hyperfine --warmup 3 \
'jq "[.[].category] | sort | unique" channels.json' \
'jq "[.[].category | select((. != null) and (. != \"XXX\"))] | sort | unique" channels.json' \
'jq ".[].category" channels.json | sort -u'
If we only test sort (without uniqueness), again jq is 9% slower than sort:
Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
---|---|---|---|---|
jq "[.[].category] \| sort" channels.json |
133.9 ± 1.6 | 131.1 | 138.2 | 1.09 ± 0.02 |
jq ".[].category" channels.json \| sort |
123.0 ± 1.3 | 120.5 | 125.7 | 1.00 |
Summary
'jq ".[].category" channels.json | sort' ran
1.09 ± 0.02 times faster than 'jq "[.[].category] | sort" channels.json'
versions:
jq-1.5-1-a5b5cbe
sort (GNU coreutils) 8.28
I expected using jq's internal functions would result in a faster processing than piping into an external app which itself should be spawned. Am I using jq poorly?
update Just repeated this experiment on host with FLASH storage, Arm CPU and these versions:
jq-1.6
sort (GNU coreutils) 8.32
result:
Benchmark #1: jq "[.[].category] | sort" channels.json
Time (mean ± σ): 587.8 ms ± 3.9 ms [User: 539.5 ms, System: 44.2 ms]
Range (min … max): 582.8 ms … 594.2 ms 10 runs
Benchmark #2: jq ".[].category" channels.json | sort
Time (mean ± σ): 606.0 ms ± 8.6 ms [User: 569.5 ms, System: 49.0 ms]
Range (min … max): 589.6 ms … 616.2 ms 10 runs
Summary
'jq "[.[].category] | sort" channels.json' ran
1.03 ± 0.02 times faster than 'jq ".[].category" channels.json | sort'
Now jq sort runs 3% faster than GNU sort :D
jq
uses your C library'sqsort()
implementation. All your tests are running in sub-second time, indicating that any memory admin overhead is likely significantly affecting the results, so we can't say more than possibly "okay". Test again on data that takes around 2 to 5 seconds to sort. Spawning a single process to runsort
is quick (it's not as if you're running a shell loop, spawningsort
in every iteration or something).sort
utility with--debug
on my OpenBSD system indicates that it's using Radix Sort.sort
dependency or tolerate a ~20% speed penaltysort
allows for specifying--qsort
. Runningsort --qsort | uniq
(-u
can't be used with--qsort
), I get exactly the same identical timings as withsort | unique
injq
.