Timeline for Are too many pipes bad for performance

Current License: CC BY-SA 4.0

20 events

when toggle format	what		by	license	comment
Sep 19, 2018 at 4:43	audit	Low quality answers
Sep 19, 2018 at 4:55
Aug 23, 2018 at 10:17	comment	added	user9903423		@MaximEgorushkin thank you! I think the second command with stdbuf is pretty fast.
Aug 23, 2018 at 7:31	vote	accept	CommunityBot
Aug 23, 2018 at 5:43	comment	added	hek2mgl		If files are large you want fewer why? About newlines in filenames. Take into account that you can't always control the input. Especially If the tool is about to be released to the public. But also read what I wrote about md5sum and newlines in filenames.
Aug 22, 2018 at 23:39	history	edited	Maxim Egorushkin	CC BY-SA 4.0	added 49 characters in body
Aug 22, 2018 at 23:33	history	edited	Maxim Egorushkin	CC BY-SA 4.0	added 274 characters in body
Aug 22, 2018 at 23:26	comment	added	Maxim Egorushkin		@hek2mgl You can use a buffer size of 0 instead of line buffered if you want that. - I am unlikely to want that. I want filenames without newlines (pretty reasonable) and line-buffered mode.
Aug 22, 2018 at 23:22	comment	added	Maxim Egorushkin		@hek2mgl With `find ... -exec md5sum {} +` you do not have control of how many filenames are passed to `md5sum`. If files are large you want fewer, if small - more. Also, I am not sure if `find` blocks while it does `-exec`, whereas with `\| xargs` it does not block.
Aug 22, 2018 at 23:21	comment	added	Maxim Egorushkin		@hek2mgl I wouldn't suggest to use `xargs` like that. newlines in filenames would break it. - if you have newlines in filenames you may have a bigger problem.
Aug 22, 2018 at 21:53	comment	added	hek2mgl		Having that you can probably just use `find ... -exec md5sum {} + \| grep -vf ...` - without the stdbuf magic.
Aug 22, 2018 at 21:52	comment	added	hek2mgl		At the end the usage of md5sum for files which can contain newlines is just broken because it is replacing the newline with `\n` making it impossible to say if that filename literally contained `\n` or a newline.
Aug 22, 2018 at 21:29	comment	added	hek2mgl		I wouldn't suggest to use `xargs` like that. newlines in filenames would break it. Well, the fact that the OP is using `grep -vf` to filter out filenames makes the solution (in the question) vulnerable to newlines in filenames anyway.
Aug 22, 2018 at 21:23	comment	added	hek2mgl		@MaximEgorushkin You can use a buffer size of 0 instead of line buffered if you want that.
Aug 22, 2018 at 20:02	history	edited	Maxim Egorushkin	CC BY-SA 4.0	added 389 characters in body
Aug 22, 2018 at 19:56	comment	added	Maxim Egorushkin		@PeterA.Schneider Thinking more about buffering, the command uses 0-separators, so that line-buffered mode won't affect it.
Aug 22, 2018 at 19:51	history	edited	Maxim Egorushkin	CC BY-SA 4.0	added 19 characters in body
Aug 22, 2018 at 16:45	history	edited	Maxim Egorushkin	CC BY-SA 4.0	added 173 characters in body
Aug 22, 2018 at 16:27	comment	added	Maxim Egorushkin		@PeterA.Schneider Yep, line-buffered `find` into `parallel md5sum` and then collect, filter and sort `md5sum` outputs.
Aug 22, 2018 at 16:19	comment	added	Peter - Reinstate Monica		One could sort after the md5sum call (by column 2, the file name output of md5sum); this would ideally allow the hash computation to start while the search is still running; one is disk, the other one is CPU, so there may be a real benefit. Perhaps one should use `unbuffer` or `stdbuf` (see unix.stackexchange.com/questions/25372/…) to let md5sum start immediately.
Aug 22, 2018 at 16:08	history	answered	Maxim Egorushkin	CC BY-SA 4.0

toggle format

Collectives™ on Stack Overflow

Timeline for Are too many pipes bad for performance

Current License: CC BY-SA 4.0