Skip to main content
added 49 characters in body
Source Link
Maxim Egorushkin
  • 134.6k
  • 17
  • 190
  • 282

sort introduces blocking here: it has to wait till find completed before outputting its results. find on a large filesystem, especially with hdd or nfs, may take a while.

You may like to sort at the very end to allow md5sum to run in parallel with find, e.g.:

find ./vendor -type f -print0 | xargs -0 md5sum | grep -vf /usr/local/bin/vchecker_ignore | sort -k2 > MD5sums

md5sum may take some time for large files. You may like to run it with GNU parallel instead of xargs if there are many files or files are large.


You may also like to play with line-buffered mode. In this case it needs to use new-line delimiters for filenames (that prohibits new-line symbols in filenames, which would be rather unusual) instead of 0-delimiter for line-buffered mode to work. E.g.:

stdbuf -oL find ./vendor -type f | stdbuf -oL grep -vf /usr/local/bin/vchecker_ignore | xargs -n50 -d'\n' md5sum | sort -k2 > MD5sums

The above command is going to filter each file through that grep first and then execute md5sum on batches of 50 files. For small files you may like larger batches (and may be remove both stdbuf -oL completely), for large files - smaller.

sort introduces blocking here: it has to wait till find completed before outputting its results. find on a large filesystem, especially with hdd or nfs, may take a while.

You may like to sort at the very end to allow md5sum to run in parallel with find, e.g.:

find ./vendor -type f -print0 | xargs -0 md5sum | grep -vf /usr/local/bin/vchecker_ignore | sort -k2 > MD5sums

md5sum may take some time for large files. You may like to run it with GNU parallel instead of xargs if there are many files or files are large.


You may also like to play with line-buffered mode. In this case it needs to use new-line delimiters for filenames (that prohibits new-line symbols in filenames, which would be rather unusual) instead of 0-delimiter for line-buffered mode to work. E.g.:

stdbuf -oL find ./vendor -type f | stdbuf -oL grep -vf /usr/local/bin/vchecker_ignore | xargs -n50 -d'\n' md5sum | sort -k2 > MD5sums

The above command is going to filter each file through that grep first and then execute md5sum on batches of 50 files. For small files you may like larger batches, for large files - smaller.

sort introduces blocking here: it has to wait till find completed before outputting its results. find on a large filesystem, especially with hdd or nfs, may take a while.

You may like to sort at the very end to allow md5sum to run in parallel with find, e.g.:

find ./vendor -type f -print0 | xargs -0 md5sum | grep -vf /usr/local/bin/vchecker_ignore | sort -k2 > MD5sums

md5sum may take some time for large files. You may like to run it with GNU parallel instead of xargs if there are many files or files are large.


You may also like to play with line-buffered mode. In this case it needs to use new-line delimiters for filenames (that prohibits new-line symbols in filenames, which would be rather unusual) instead of 0-delimiter for line-buffered mode to work. E.g.:

stdbuf -oL find ./vendor -type f | stdbuf -oL grep -vf /usr/local/bin/vchecker_ignore | xargs -n50 -d'\n' md5sum | sort -k2 > MD5sums

The above command is going to filter each file through that grep first and then execute md5sum on batches of 50 files. For small files you may like larger batches (and may be remove both stdbuf -oL completely), for large files - smaller.

added 274 characters in body
Source Link
Maxim Egorushkin
  • 134.6k
  • 17
  • 190
  • 282

sort may introduceintroduces blocking here: it has to wait till EOF from find completed before outputting its results. find on a large filesystem, especially with hdd or nfs, may take a while. 

You may like to sort at the very end to allow md5sum to run in parallel with find, e.g.:

find ./vendor -type f -print0 | xargs -0 md5sum | grep -vf /usr/local/bin/vchecker_ignore | sort -k2 > MD5sums

md5sum may take some time for large files. You may like to run it with GNU parallel instead of xargs if there are many files or files are large.


You may also like to play with line-buffered mode. In this case it needs to use new-line delimiters for filenames (that prohibits new-line symbols in filenames, which would be rather unusual) instead of 0-delimiter for line-buffered mode to work. E.g.:

stdbuf -oL find ./vendor -type f | xargs -n50stdbuf -d'\n' md5sum |oL grep -vf /usr/local/bin/vchecker_ignore | xargs -n50 -d'\n' md5sum | sort -k2 > MD5sums

The above command is going to filter each file through that grep first and then execute md5sum on batches of 50 files. For small files you may like larger batches, for large files - smaller.

sort may introduce blocking here: it has to wait till EOF from find before outputting its results. You may like to sort at the very end, e.g.:

find ./vendor -type f -print0 | xargs -0 md5sum | grep -vf /usr/local/bin/vchecker_ignore | sort -k2 > MD5sums

md5sum may take some time for large files. You may like to run it with GNU parallel instead of xargs if there are many files or files are large.


You may also like to play with line-buffered mode. In this case it needs to use new-line delimiters for filenames instead of 0-delimiter for line-buffered mode to work. E.g.:

stdbuf -oL find ./vendor -type f | xargs -n50 -d'\n' md5sum | grep -vf /usr/local/bin/vchecker_ignore | sort -k2 > MD5sums

The above command is going to execute md5sum on batches of 50 files.

sort introduces blocking here: it has to wait till find completed before outputting its results. find on a large filesystem, especially with hdd or nfs, may take a while. 

You may like to sort at the very end to allow md5sum to run in parallel with find, e.g.:

find ./vendor -type f -print0 | xargs -0 md5sum | grep -vf /usr/local/bin/vchecker_ignore | sort -k2 > MD5sums

md5sum may take some time for large files. You may like to run it with GNU parallel instead of xargs if there are many files or files are large.


You may also like to play with line-buffered mode. In this case it needs to use new-line delimiters for filenames (that prohibits new-line symbols in filenames, which would be rather unusual) instead of 0-delimiter for line-buffered mode to work. E.g.:

stdbuf -oL find ./vendor -type f | stdbuf -oL grep -vf /usr/local/bin/vchecker_ignore | xargs -n50 -d'\n' md5sum | sort -k2 > MD5sums

The above command is going to filter each file through that grep first and then execute md5sum on batches of 50 files. For small files you may like larger batches, for large files - smaller.

added 389 characters in body
Source Link
Maxim Egorushkin
  • 134.6k
  • 17
  • 190
  • 282

sort may introduce blocking here: it has to wait till EOF from find before outputting its results. You may like to sort at the very end, e.g.:

find ./vendor -type f -print0 | xargs -0 md5sum | grep -vf /usr/local/bin/vchecker_ignore | sort -k2 > MD5sums

md5sum may take some time for large files. You may like to run it with GNU parallel instead of xargs if there are many files or files are large.


You may also like to play with line-buffered mode. In this case it needs to use new-line delimiters for filenames instead of 0-delimiter for line-buffered mode to work. E.g.:

stdbuf -oL find ./vendor -type f | xargs -n50 -d'\n' md5sum | grep -vf /usr/local/bin/vchecker_ignore | sort -k2 > MD5sums

The above command is going to execute md5sum on batches of 50 files.

sort may introduce blocking here: it has to wait till EOF from find before outputting its results. You may like to sort at the very end, e.g.:

find ./vendor -type f -print0 | xargs -0 md5sum | grep -vf /usr/local/bin/vchecker_ignore | sort -k2 > MD5sums

md5sum may take some time for large files. You may like to run it with GNU parallel instead of xargs if there are many files or files are large.

sort may introduce blocking here: it has to wait till EOF from find before outputting its results. You may like to sort at the very end, e.g.:

find ./vendor -type f -print0 | xargs -0 md5sum | grep -vf /usr/local/bin/vchecker_ignore | sort -k2 > MD5sums

md5sum may take some time for large files. You may like to run it with GNU parallel instead of xargs if there are many files or files are large.


You may also like to play with line-buffered mode. In this case it needs to use new-line delimiters for filenames instead of 0-delimiter for line-buffered mode to work. E.g.:

stdbuf -oL find ./vendor -type f | xargs -n50 -d'\n' md5sum | grep -vf /usr/local/bin/vchecker_ignore | sort -k2 > MD5sums

The above command is going to execute md5sum on batches of 50 files.

added 19 characters in body
Source Link
Maxim Egorushkin
  • 134.6k
  • 17
  • 190
  • 282
Loading
added 173 characters in body
Source Link
Maxim Egorushkin
  • 134.6k
  • 17
  • 190
  • 282
Loading
Source Link
Maxim Egorushkin
  • 134.6k
  • 17
  • 190
  • 282
Loading