Revisions to Are too many pipes bad for performance

added 49 characters in body

Source Link

edited Aug 22, 2018 at 23:39

134.6k
17
190
282

sort introduces blocking here: it has to wait till find completed before outputting its results. find on a large filesystem, especially with hdd or nfs, may take a while.

You may like to sort at the very end to allow md5sum to run in parallel with find, e.g.:

find ./vendor -type f -print0 | xargs -0 md5sum | grep -vf /usr/local/bin/vchecker_ignore | sort -k2 > MD5sums

md5sum may take some time for large files. You may like to run it with GNU parallel instead of xargs if there are many files or files are large.

You may also like to play with line-buffered mode. In this case it needs to use new-line delimiters for filenames (that prohibits new-line symbols in filenames, which would be rather unusual) instead of 0-delimiter for line-buffered mode to work. E.g.:

stdbuf -oL find ./vendor -type f | stdbuf -oL grep -vf /usr/local/bin/vchecker_ignore | xargs -n50 -d'\n' md5sum | sort -k2 > MD5sums

The above command is going to filter each file through that grep first and then execute md5sum on batches of 50 files. For small files you may like larger batches (and may be remove both stdbuf -oL completely), for large files - smaller.

sort introduces blocking here: it has to wait till find completed before outputting its results. find on a large filesystem, especially with hdd or nfs, may take a while.

You may like to sort at the very end to allow md5sum to run in parallel with find, e.g.:

find ./vendor -type f -print0 | xargs -0 md5sum | grep -vf /usr/local/bin/vchecker_ignore | sort -k2 > MD5sums

md5sum may take some time for large files. You may like to run it with GNU parallel instead of xargs if there are many files or files are large.

You may also like to play with line-buffered mode. In this case it needs to use new-line delimiters for filenames (that prohibits new-line symbols in filenames, which would be rather unusual) instead of 0-delimiter for line-buffered mode to work. E.g.:

stdbuf -oL find ./vendor -type f | stdbuf -oL grep -vf /usr/local/bin/vchecker_ignore | xargs -n50 -d'\n' md5sum | sort -k2 > MD5sums

The above command is going to filter each file through that grep first and then execute md5sum on batches of 50 files. For small files you may like larger batches, for large files - smaller.

sort introduces blocking here: it has to wait till find completed before outputting its results. find on a large filesystem, especially with hdd or nfs, may take a while.

You may like to sort at the very end to allow md5sum to run in parallel with find, e.g.:

find ./vendor -type f -print0 | xargs -0 md5sum | grep -vf /usr/local/bin/vchecker_ignore | sort -k2 > MD5sums

md5sum may take some time for large files. You may like to run it with GNU parallel instead of xargs if there are many files or files are large.

You may also like to play with line-buffered mode. In this case it needs to use new-line delimiters for filenames (that prohibits new-line symbols in filenames, which would be rather unusual) instead of 0-delimiter for line-buffered mode to work. E.g.:

stdbuf -oL find ./vendor -type f | stdbuf -oL grep -vf /usr/local/bin/vchecker_ignore | xargs -n50 -d'\n' md5sum | sort -k2 > MD5sums

The above command is going to filter each file through that grep first and then execute md5sum on batches of 50 files. For small files you may like larger batches (and may be remove both stdbuf -oL completely), for large files - smaller.

added 274 characters in body

Source Link

edited Aug 22, 2018 at 23:33

Maxim Egorushkin

134.6k
17
190
282

sort may introduceintroduces blocking here: it has to wait till EOF from find completed before outputting its results. find on a large filesystem, especially with hdd or nfs, may take a while.

You may like to sort at the very end to allow md5sum to run in parallel with find, e.g.:

find ./vendor -type f -print0 | xargs -0 md5sum | grep -vf /usr/local/bin/vchecker_ignore | sort -k2 > MD5sums

md5sum may take some time for large files. You may like to run it with GNU parallel instead of xargs if there are many files or files are large.

You may also like to play with line-buffered mode. In this case it needs to use new-line delimiters for filenames (that prohibits new-line symbols in filenames, which would be rather unusual) instead of 0-delimiter for line-buffered mode to work. E.g.:

stdbuf -oL find ./vendor -type f | xargs -n50stdbuf -d'\n' md5sum |oL grep -vf /usr/local/bin/vchecker_ignore | xargs -n50 -d'\n' md5sum | sort -k2 > MD5sums

The above command is going to filter each file through that grep first and then execute md5sum on batches of 50 files. For small files you may like larger batches, for large files - smaller.

sort may introduce blocking here: it has to wait till EOF from find before outputting its results. You may like to sort at the very end, e.g.:

find ./vendor -type f -print0 | xargs -0 md5sum | grep -vf /usr/local/bin/vchecker_ignore | sort -k2 > MD5sums

md5sum may take some time for large files. You may like to run it with GNU parallel instead of xargs if there are many files or files are large.

You may also like to play with line-buffered mode. In this case it needs to use new-line delimiters for filenames instead of 0-delimiter for line-buffered mode to work. E.g.:

stdbuf -oL find ./vendor -type f | xargs -n50 -d'\n' md5sum | grep -vf /usr/local/bin/vchecker_ignore | sort -k2 > MD5sums

The above command is going to execute md5sum on batches of 50 files.

sort introduces blocking here: it has to wait till find completed before outputting its results. find on a large filesystem, especially with hdd or nfs, may take a while.

You may like to sort at the very end to allow md5sum to run in parallel with find, e.g.:

find ./vendor -type f -print0 | xargs -0 md5sum | grep -vf /usr/local/bin/vchecker_ignore | sort -k2 > MD5sums

md5sum may take some time for large files. You may like to run it with GNU parallel instead of xargs if there are many files or files are large.

You may also like to play with line-buffered mode. In this case it needs to use new-line delimiters for filenames (that prohibits new-line symbols in filenames, which would be rather unusual) instead of 0-delimiter for line-buffered mode to work. E.g.:

stdbuf -oL find ./vendor -type f | stdbuf -oL grep -vf /usr/local/bin/vchecker_ignore | xargs -n50 -d'\n' md5sum | sort -k2 > MD5sums

The above command is going to filter each file through that grep first and then execute md5sum on batches of 50 files. For small files you may like larger batches, for large files - smaller.

added 389 characters in body

Source Link

edited Aug 22, 2018 at 20:02

Maxim Egorushkin

134.6k
17
190
282

sort may introduce blocking here: it has to wait till EOF from find before outputting its results. You may like to sort at the very end, e.g.:

find ./vendor -type f -print0 | xargs -0 md5sum | grep -vf /usr/local/bin/vchecker_ignore | sort -k2 > MD5sums

md5sum may take some time for large files. You may like to run it with GNU parallel instead of xargs if there are many files or files are large.

You may also like to play with line-buffered mode. In this case it needs to use new-line delimiters for filenames instead of 0-delimiter for line-buffered mode to work. E.g.:

stdbuf -oL find ./vendor -type f | xargs -n50 -d'\n' md5sum | grep -vf /usr/local/bin/vchecker_ignore | sort -k2 > MD5sums

The above command is going to execute md5sum on batches of 50 files.

sort may introduce blocking here: it has to wait till EOF from find before outputting its results. You may like to sort at the very end, e.g.:

find ./vendor -type f -print0 | xargs -0 md5sum | grep -vf /usr/local/bin/vchecker_ignore | sort -k2 > MD5sums

md5sum may take some time for large files. You may like to run it with GNU parallel instead of xargs if there are many files or files are large.

sort may introduce blocking here: it has to wait till EOF from find before outputting its results. You may like to sort at the very end, e.g.:

find ./vendor -type f -print0 | xargs -0 md5sum | grep -vf /usr/local/bin/vchecker_ignore | sort -k2 > MD5sums

md5sum may take some time for large files. You may like to run it with GNU parallel instead of xargs if there are many files or files are large.

You may also like to play with line-buffered mode. In this case it needs to use new-line delimiters for filenames instead of 0-delimiter for line-buffered mode to work. E.g.:

stdbuf -oL find ./vendor -type f | xargs -n50 -d'\n' md5sum | grep -vf /usr/local/bin/vchecker_ignore | sort -k2 > MD5sums

The above command is going to execute md5sum on batches of 50 files.

added 19 characters in body

Source Link

edited Aug 22, 2018 at 19:51

Maxim Egorushkin

134.6k
17
190
282

Loading

added 173 characters in body

Source Link

edited Aug 22, 2018 at 16:45

Maxim Egorushkin

134.6k
17
190
282

Loading

Source Link

created Aug 22, 2018 at 16:08

Maxim Egorushkin

134.6k
17
190
282

Loading

Collectives™ on Stack Overflow

Return to Answer