Timeline for Are too many pipes bad for performance
Current License: CC BY-SA 4.0
20 events
when toggle format | what | by | license | comment | |
---|---|---|---|---|---|
Sep 19, 2018 at 4:43 | audit | Low quality answers | |||
Sep 19, 2018 at 4:55 | |||||
Aug 23, 2018 at 10:17 | comment | added | user9903423 | @MaximEgorushkin thank you! I think the second command with stdbuf is pretty fast. | |
Aug 23, 2018 at 7:31 | vote | accept | CommunityBot | ||
Aug 23, 2018 at 5:43 | comment | added | hek2mgl | If files are large you want fewer why? About newlines in filenames. Take into account that you can't always control the input. Especially If the tool is about to be released to the public. But also read what I wrote about md5sum and newlines in filenames. | |
Aug 22, 2018 at 23:39 | history | edited | Maxim Egorushkin | CC BY-SA 4.0 |
added 49 characters in body
|
Aug 22, 2018 at 23:33 | history | edited | Maxim Egorushkin | CC BY-SA 4.0 |
added 274 characters in body
|
Aug 22, 2018 at 23:26 | comment | added | Maxim Egorushkin | @hek2mgl You can use a buffer size of 0 instead of line buffered if you want that. - I am unlikely to want that. I want filenames without newlines (pretty reasonable) and line-buffered mode. | |
Aug 22, 2018 at 23:22 | comment | added | Maxim Egorushkin |
@hek2mgl With find ... -exec md5sum {} + you do not have control of how many filenames are passed to md5sum . If files are large you want fewer, if small - more. Also, I am not sure if find blocks while it does -exec , whereas with | xargs it does not block.
|
|
Aug 22, 2018 at 23:21 | comment | added | Maxim Egorushkin |
@hek2mgl I wouldn't suggest to use xargs like that. newlines in filenames would break it. - if you have newlines in filenames you may have a bigger problem.
|
|
Aug 22, 2018 at 21:53 | comment | added | hek2mgl |
Having that you can probably just use find ... -exec md5sum {} + | grep -vf ... - without the stdbuf magic.
|
|
Aug 22, 2018 at 21:52 | comment | added | hek2mgl |
At the end the usage of md5sum for files which can contain newlines is just broken because it is replacing the newline with \n making it impossible to say if that filename literally contained \n or a newline.
|
|
Aug 22, 2018 at 21:29 | comment | added | hek2mgl |
I wouldn't suggest to use xargs like that. newlines in filenames would break it. Well, the fact that the OP is using grep -vf to filter out filenames makes the solution (in the question) vulnerable to newlines in filenames anyway.
|
|
Aug 22, 2018 at 21:23 | comment | added | hek2mgl | @MaximEgorushkin You can use a buffer size of 0 instead of line buffered if you want that. | |
Aug 22, 2018 at 20:02 | history | edited | Maxim Egorushkin | CC BY-SA 4.0 |
added 389 characters in body
|
Aug 22, 2018 at 19:56 | comment | added | Maxim Egorushkin | @PeterA.Schneider Thinking more about buffering, the command uses 0-separators, so that line-buffered mode won't affect it. | |
Aug 22, 2018 at 19:51 | history | edited | Maxim Egorushkin | CC BY-SA 4.0 |
added 19 characters in body
|
Aug 22, 2018 at 16:45 | history | edited | Maxim Egorushkin | CC BY-SA 4.0 |
added 173 characters in body
|
Aug 22, 2018 at 16:27 | comment | added | Maxim Egorushkin |
@PeterA.Schneider Yep, line-buffered find into parallel md5sum and then collect, filter and sort md5sum outputs.
|
|
Aug 22, 2018 at 16:19 | comment | added | Peter - Reinstate Monica |
One could sort after the md5sum call (by column 2, the file name output of md5sum); this would ideally allow the hash computation to start while the search is still running; one is disk, the other one is CPU, so there may be a real benefit. Perhaps one should use unbuffer or stdbuf (see unix.stackexchange.com/questions/25372/…) to let md5sum start immediately.
|
|
Aug 22, 2018 at 16:08 | history | answered | Maxim Egorushkin | CC BY-SA 4.0 |