Skip to main content
20 events
when toggle format what by license comment
Sep 19, 2018 at 4:43 audit Low quality answers
Sep 19, 2018 at 4:55
Aug 23, 2018 at 10:17 comment added user9903423 @MaximEgorushkin thank you! I think the second command with stdbuf is pretty fast.
Aug 23, 2018 at 7:31 vote accept CommunityBot
Aug 23, 2018 at 5:43 comment added hek2mgl If files are large you want fewer why? About newlines in filenames. Take into account that you can't always control the input. Especially If the tool is about to be released to the public. But also read what I wrote about md5sum and newlines in filenames.
Aug 22, 2018 at 23:39 history edited Maxim Egorushkin CC BY-SA 4.0
added 49 characters in body
Aug 22, 2018 at 23:33 history edited Maxim Egorushkin CC BY-SA 4.0
added 274 characters in body
Aug 22, 2018 at 23:26 comment added Maxim Egorushkin @hek2mgl You can use a buffer size of 0 instead of line buffered if you want that. - I am unlikely to want that. I want filenames without newlines (pretty reasonable) and line-buffered mode.
Aug 22, 2018 at 23:22 comment added Maxim Egorushkin @hek2mgl With find ... -exec md5sum {} + you do not have control of how many filenames are passed to md5sum. If files are large you want fewer, if small - more. Also, I am not sure if find blocks while it does -exec, whereas with | xargs it does not block.
Aug 22, 2018 at 23:21 comment added Maxim Egorushkin @hek2mgl I wouldn't suggest to use xargs like that. newlines in filenames would break it. - if you have newlines in filenames you may have a bigger problem.
Aug 22, 2018 at 21:53 comment added hek2mgl Having that you can probably just use find ... -exec md5sum {} + | grep -vf ... - without the stdbuf magic.
Aug 22, 2018 at 21:52 comment added hek2mgl At the end the usage of md5sum for files which can contain newlines is just broken because it is replacing the newline with \n making it impossible to say if that filename literally contained \n or a newline.
Aug 22, 2018 at 21:29 comment added hek2mgl I wouldn't suggest to use xargs like that. newlines in filenames would break it. Well, the fact that the OP is using grep -vf to filter out filenames makes the solution (in the question) vulnerable to newlines in filenames anyway.
Aug 22, 2018 at 21:23 comment added hek2mgl @MaximEgorushkin You can use a buffer size of 0 instead of line buffered if you want that.
Aug 22, 2018 at 20:02 history edited Maxim Egorushkin CC BY-SA 4.0
added 389 characters in body
Aug 22, 2018 at 19:56 comment added Maxim Egorushkin @PeterA.Schneider Thinking more about buffering, the command uses 0-separators, so that line-buffered mode won't affect it.
Aug 22, 2018 at 19:51 history edited Maxim Egorushkin CC BY-SA 4.0
added 19 characters in body
Aug 22, 2018 at 16:45 history edited Maxim Egorushkin CC BY-SA 4.0
added 173 characters in body
Aug 22, 2018 at 16:27 comment added Maxim Egorushkin @PeterA.Schneider Yep, line-buffered find into parallel md5sum and then collect, filter and sort md5sum outputs.
Aug 22, 2018 at 16:19 comment added Peter - Reinstate Monica One could sort after the md5sum call (by column 2, the file name output of md5sum); this would ideally allow the hash computation to start while the search is still running; one is disk, the other one is CPU, so there may be a real benefit. Perhaps one should use unbuffer or stdbuf (see unix.stackexchange.com/questions/25372/…) to let md5sum start immediately.
Aug 22, 2018 at 16:08 history answered Maxim Egorushkin CC BY-SA 4.0