Is jq internal sort slower than GNU sort?

Ask Question

Asked 3 years ago

Modified 3 years ago

Viewed 532 times

While filtering through this json file I did a benchmark and found out utilizing jq's internal sort and unique method is actually 25% slower than sort --unique!

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`jq "[.[].category] \\| sort \\| unique" channels.json`	172.0 ± 2.6	167.8	176.8	1.25 ± 0.06
`jq "[.[].category \\| select((. != null) and (. != \"XXX\"))] \\| sort \\| unique" channels.json`	151.9 ± 4.1	146.5	163.9	1.11 ± 0.06
`jq ".[].category" channels.json \\| sort -u`	137.2 ± 6.6	131.8	156.6	1.00

Summary
  'jq ".[].category" channels.json | sort -u' ran
    1.11 ± 0.06 times faster than 'jq "[.[].category | select((. != null) and (. != \"XXX\"))] | sort | unique" channels.json'
    1.25 ± 0.06 times faster than 'jq "[.[].category] | sort | unique" channels.json'

test command:

hyperfine --warmup 3 \
    'jq "[.[].category] | sort | unique" channels.json'  \
    'jq "[.[].category | select((. != null) and (. != \"XXX\"))] | sort | unique" channels.json' \
    'jq ".[].category" channels.json | sort -u'

If we only test sort (without uniqueness), again jq is 9% slower than sort:

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`jq "[.[].category] \\| sort" channels.json`	133.9 ± 1.6	131.1	138.2	1.09 ± 0.02
`jq ".[].category" channels.json \\| sort`	123.0 ± 1.3	120.5	125.7	1.00

Summary
  'jq ".[].category" channels.json | sort' ran
    1.09 ± 0.02 times faster than 'jq "[.[].category] | sort" channels.json'

versions:

jq-1.5-1-a5b5cbe
sort (GNU coreutils) 8.28

I expected using jq's internal functions would result in a faster processing than piping into an external app which itself should be spawned. Am I using jq poorly?

update Just repeated this experiment on host with FLASH storage, Arm CPU and these versions:

jq-1.6
sort (GNU coreutils) 8.32

result:

Benchmark #1: jq "[.[].category] | sort" channels.json
  Time (mean ± σ):     587.8 ms ±   3.9 ms    [User: 539.5 ms, System: 44.2 ms]
  Range (min … max):   582.8 ms … 594.2 ms    10 runs
 
Benchmark #2: jq ".[].category" channels.json | sort
  Time (mean ± σ):     606.0 ms ±   8.6 ms    [User: 569.5 ms, System: 49.0 ms]
  Range (min … max):   589.6 ms … 616.2 ms    10 runs
 
Summary
  'jq "[.[].category] | sort" channels.json' ran
    1.03 ± 0.02 times faster than 'jq ".[].category" channels.json | sort'

Now jq sort runs 3% faster than GNU sort :D

edited Jun 26, 2021 at 20:28

asked Jun 26, 2021 at 8:39

Zeta.Investigator

1,0601 gold badge14 silver badges26 bronze badges

2

jq uses your C library's qsort() implementation. All your tests are running in sub-second time, indicating that any memory admin overhead is likely significantly affecting the results, so we can't say more than possibly "okay". Test again on data that takes around 2 to 5 seconds to sort. Spawning a single process to run sort is quick (it's not as if you're running a shell loop, spawning sort in every iteration or something).
– Kusalananda ♦
Commented Jun 26, 2021 at 8:55
1

jq's sort operation is not specialised to strings, so it is also fundamentally doing more work with every comparison.
– Michael Homer
Commented Jun 26, 2021 at 9:28
I re-did your benchmarks with much more data, and I can only say that I confirmed your numbers. The difference in timing is probably due to using different sorting algorithms. Running the sort utility with --debug on my OpenBSD system indicates that it's using Radix Sort.
– Kusalananda ♦
Commented Jun 26, 2021 at 9:29
@Kusalananda Good to know. Latest version of jq? So my problem has boiled down to whether I want to add a sort dependency or tolerate a ~20% speed penalty
– Zeta.Investigator
Commented Jun 26, 2021 at 9:53
2

Just to say that the OpenBSD sort allows for specifying --qsort. Running sort --qsort | uniq (-u can't be used with --qsort), I get exactly the same identical timings as with sort | unique in jq.
– Kusalananda ♦
Commented Jun 26, 2021 at 10:07

| Show 3 more comments

Stack Exchange Network

Is jq internal sort slower than GNU sort?

0

You must log in to answer this question.

Browse other questions tagged
sort
jq
coreutils
benchmark
optimization
.

Hot Network Questions

Is jq internal sort slower than GNU sort?

0

You must log in to answer this question.

Browse other questions tagged sortjqcoreutilsbenchmarkoptimization.

Related

Hot Network Questions

Browse other questions tagged
sort
jq
coreutils
benchmark
optimization
.