Why is looping over find's output bad practice?

Question

This question is inspired by

Why is using a shell loop to process text considered bad practice ?

I see these constructs

for file in `find . -type f -name ...`; do smth with ${file}; done

and

for dir in $(find . -type d -name ...); do smth with ${dir}; done

being used here almost on a daily basis even if some people take the time to comment on those posts explaining why this kind of stuff should be avoided...
Seeing the number of such posts (and the fact that sometimes those comments are simply ignored) I thought I might as well ask a question:

Why is looping over find's output bad practice and what's the proper way to run one or more commands for each file name/path returned by find ?

I think this is sort of like "Never parse ls output!" - you can certainly do either one on a one off basis, but they're more of a quick hack than production quality. Or, more generally, definitely never be dogmatic. — user732, Commented Nov 7, 2016 at 18:41
More generally: Why does my shell script choke on whitespace or other special characters? — Gilles 'SO- stop being evil', Commented Nov 7, 2016 at 22:44
One ancillary point -- you may want to send the output to a file, and then process it later in the script. This way the file list is available for review if you need to debug the script. — user117529, Commented Nov 11, 2016 at 17:58
Note, that the output of find needn't be filenames (see -print variants), so the reasoning is slightly too general. :) — user unknown, Commented Mar 5, 2020 at 15:52

Community · Accepted Answer · 2017-04-13 12:37:00Z

Why is looping over find's output bad practice?

The simple answer is:

Because filenames can contain any character.

Therefore, there is no printable character you can reliably use to delimit filenames.

Newlines are often used (incorrectly) to delimit filenames, because it is unusual to include newline characters in filenames.

However, if you build your software around arbitrary assumptions, you at best simply fail to handle unusual cases, and at worst open yourself up to malicious exploits that give away control of your system. So it's a question of robustness and safety.

If you can write software in two different ways, and one of them handles edge cases (unusual inputs) correctly, but the other one is easier to read, you might argue that there is a tradeoff. (I wouldn't. I prefer correct code.)

However, if the correct, robust version of the code is also easy to read, there is no excuse for writing code that fails on edge cases. This is the case with find and the need to run a command on each file found.

Let's be more specific: On a UNIX or Linux system, filenames may contain any character except for a / (which is used as a path component separator), and they may not contain a null byte.

A null byte is therefore the only correct way to delimit filenames.

Since GNU find includes a -print0 primary which will use a null byte to delimit the filenames it prints, GNU find can safely be used with GNU xargs and its -0 flag (and -r flag) to handle the output of find:

find ... -print0 | xargs -r0 ...

However, there is no good reason to use this form, because:

It adds a dependency on GNU findutils which doesn't need to be there, and
find is designed to be able to run commands on the files it finds.

Also, GNU xargs requires -0 and -r, whereas FreeBSD xargs only requires -0 (and has no -r option), and some xargs don't support -0 at all. So it's best to just stick to POSIX features of find (see next section) and skip xargs.

As for point 2—find's ability to run commands on the files it finds—I think Mike Loukides said it best:

find's business is evaluating expressions -- not locating files. Yes, find certainly locates files; but that's really just a side effect.

--Unix Power Tools

POSIX specified uses of `find`

What's the proper way to run one or more commands for each of find's results?

To run a single command for each file found, use:

find dirname ... -exec somecommand {} \;

To run multiple commands in sequence for each file found, where the second command should only be run if the first command succeeds, use:

find dirname ... -exec somecommand {} \; -exec someothercommand {} \;

To run a single command on multiple files at once:

find dirname ... -exec somecommand {} +

`find` in combination with `sh`

If you need to use shell features in the command, such as redirecting the output or stripping an extension off the filename or something similar, you can make use of the sh -c construct. You should know a few things about this:

Never embed {} directly in the sh code. This allows for arbitrary code execution from maliciously crafted filenames. Also, it's actually not even specified by POSIX that it will work at all. (See next point.)
Don't use {} multiple times, or use it as part of a longer argument. This isn't portable. For example, don't do this:

~~find ... -exec cp {} somedir/{}.bak \;~~

To quote the POSIX specifications for find:

If a utility_name or argument string contains the two characters "{}", but not just the two characters "{}", it is implementation-defined whether find replaces those two characters or uses the string without change.

... If more than one argument containing the two characters "{}" is present, the behavior is unspecified.
The arguments following the shell command string passed to the -c option are set to the shell's positional parameters, starting with $0. Not starting with $1.

For this reason, it's good to include a "dummy" $0 value, such as find-sh, which will be used for error reporting from within the spawned shell. Also, this allows use of constructs such as "$@" when passing multiple files to the shell, whereas omitting a value for $0 would mean the first file passed would be set to $0 and thus not included in "$@".

To run a single shell command per file, use:

find dirname ... -exec sh -c 'somecommandwith "$1"' find-sh {} \;

However it will usually give better performance to handle the files in a shell loop so that you don't spawn a shell for every single file found:

find dirname ... -exec sh -c 'for f do somecommandwith "$f"; done' find-sh {} +

(Note that for f do is equivalent to for f in "$@"; do and handles each of the positional parameters in turn—in other words, it uses each of the files found by find, regardless of any special characters in their names.)

Further examples of correct find usage:

(Note: Feel free to extend this list.)

There's one case where I don't know of an alternative to parsing find's output -- where you need to run commands in the current shell (e.g. because you want to set variables) for each file. In this case, while IFS= read -r -u3 -d '' file; do ... done 3< <(find ... -print0) is the best idiom I know. Notes: <( ) is not portable -- use bash or zsh. Also, the -u3 and 3< are there in case anything inside the loop tries to read stdin. — Gordon Davisson, Commented Nov 7, 2016 at 23:42
@GordonDavisson, perhaps—but what do you need to set those variables for? I would argue that whatever it is should be handled inside the find ... -exec call. Or just use a shell glob, if it will handle your use case. — Wildcard, Commented Nov 7, 2016 at 23:45
Your answer is correct. However I don't like the dogma. Even though I know better, there are many (specially interactive) use cases where it's safe and just easier to type looping over find output or even worse using ls. I'm doing this daily without problems. I know about -print0, --null, -z or -0 options of all kind of tools. But I would not waste time to use them on my interactive shell prompt unless really needed. This could be also noted in your answer. — rudimeier, Commented Nov 8, 2016 at 19:48
@rudimeier, the argument on dogma vs. best practice has already been done to death. Not interested. If you use it interactively and it works, fine, good for you—but I'm not going to promote doing that. The percentage of script authors who bother to learn what robust code is and then do only that when writing production scripts, instead of just doing whatever they're used to doing interactively, is extremely minimal. The handling is to promote best practices all the time. People need to learn that there IS a correct way to do things. — Wildcard, Commented Nov 8, 2016 at 20:07
@AdrianPronk no, because there is no "in." "for f in a b c do do do echo" doesn't work, obviously. But "for f do echo; done" is fine. As is "for f in a b c do do; do echo." — Wildcard, Commented Aug 14, 2017 at 8:13

Stéphane Chazelas · Accepted Answer · 2022-05-05 11:01:41Z

The problem

for f in $(find .)

combines two incompatible things.

find prints a list of file paths delimited by newline characters. While the split+glob operator that is invoked when you leave that $(find .) unquoted in that list context splits it on the characters of $IFS (by default includes newline, but also space and tab (and NUL in zsh)) and performs globbing on each resulting word (except in zsh) (and even brace expansion in ksh93 (even if the braceexpand option is off in older versions) or pdksh derivatives!).

Even if you make it:

IFS='
' # split on newline only
set -o noglob # disable glob (also disables brace expansion
              # done upon other expansions in ksh)
for f in $(find .) # invoke split+glob

That's still wrong as the newline character is as valid as any in a file path. The output of find -print is simply not post-processable reliably (except by using some convoluted trick, as shown here).

That also means the shell needs to store the output of find fully, and then split+glob it (which implies storing that output a second time in memory) before starting to loop over the files.

Note that find . | xargs cmd has similar problems (there, blanks, newline, single quote, double quote and backslash (and with some xarg implementations bytes not forming part of valid characters) are a problem)

More correct alternatives

The only way to use a for loop on the output of find would be to use zsh that supports IFS=$'\0' and:

IFS=$'\0'
for f in $(find . -print0)

(replace -print0 with -exec printf '%s\0' {} + for find implementations that don't support the non-standard (but quite common nowadays) -print0).

Here, the correct and portable way is to use -exec:

find . -exec something with {} \;

Or if something can take more than one argument:

find . -exec something with {} +

If you do need that list of files to be handled by a shell:

find . -exec sh -c '
  for file do
    something < "$file"
  done' find-sh {} +

(beware it may start more than one sh).

On some systems, you can use:

find . -print0 | xargs -r0 something with

though that has little advantage over the standard syntax and means something's stdin is either the pipe or /dev/null.

One reason you may want to use that could be to use the -P option of GNU xargs for parallel processing. The stdin issue can also be worked around with GNU xargs with the -a option with shells supporting process substitution:

xargs -r0n 20 -P 4 -a <(find . -print0) something

for instance, to run up to 4 concurrent invocations of something each taking 20 file arguments.

With zsh or bash, another way to loop over the output of find -print0 is with:

while IFS= read -rd '' file <&3; do
  something "$file" 3<&-
done 3< <(find . -print0)

read -d '' reads NUL delimited records instead of newline delimited ones.

bash-4.4 and above can also store files returned by find -print0 in an array with:

readarray -td '' files < <(find . -print0)

The zsh equivalent (which has the advantage of preserving find's exit status):

files=(${(0)"$(find . -print0)"})

With zsh, you can translate most find expressions to a combination of recursive globbing with glob qualifiers. For instance, looping over find . -name '*.txt' -type f -mtime -1 would be:

for file (./**/*.txt(ND.m-1)) cmd $file

Or

for file (**/*.txt(ND.m-1)) cmd -- $file

(beware of the need of -- as with **/*, file paths are not starting with ./, so may start with - for instance).

ksh93 and bash eventually added support for **/ (though not more advances forms of recursive globbing), but still not the glob qualifiers which makes the use of ** very limited there. Also beware that bash prior to 4.3 follows symlinks when descending the directory tree.

Like for looping over $(find .), that also means storing the whole list of files in memory¹. That may be desirable though in some cases when you don't want your actions on the files to have an influence on the finding of files (like when you add more files that could end-up being found themselves).

Other reliability/security considerations

Race conditions

Now, if we're talking of reliability, we have to mention the race conditions between the time find/zsh finds a file and checks that it meets the criteria and the time it is being used (TOCTOU race).

Even when descending a directory tree, one has to make sure not to follow symlinks and to do that without TOCTOU race. find (GNU find at least) does that by opening the directories using openat() with the right O_NOFOLLOW flags (where supported) and keeping a file descriptor open for each directory, zsh/bash/ksh don't do that. So in the face of an attacker being able to replace a directory with a symlink at the right time, you could end up descending the wrong directory.

Even if find does descend the directory properly, with -exec cmd {} \; and even more so with -exec cmd {} +, once cmd is executed, for instance as cmd ./foo/bar or cmd ./foo/bar ./foo/bar/baz, by the time cmd makes use of ./foo/bar, the attributes of bar may no longer meet the criteria matched by find, but even worse, ./foo may have been replaced by a symlink to some other place (and the race window is made a lot bigger with -exec {} + where find waits to have enough files to call cmd).

Some find implementations have a (non-standard yet) -execdir predicate to alleviate the second problem.

With:

find . -execdir cmd -- {} \;

find chdir()s into the parent directory of the file before running cmd. Instead of calling cmd -- ./foo/bar, it calls cmd -- ./bar (cmd -- bar with some implementations, hence the --), so the problem with ./foo being changed to a symlink is avoided. That makes using commands like rm safer (it could still remove a different file, but not a file in a different directory), but not commands that may modify the files unless they've been designed to not follow symlinks.

-execdir cmd -- {} + sometimes also works but with several implementations including some versions of GNU find, it is equivalent to -execdir cmd -- {} \;.

-execdir also has the benefit of working around some of the problems associated with too deep directory trees.

In:

find . -exec cmd {} \;

the size of the path given to cmd will grow with the depth of the directory the file is in. If that size gets bigger than PATH_MAX (something like 4k on Linux), then any system call that cmd does on that path will fail with a ENAMETOOLONG error.

With -execdir, only the file name (possibly prefixed with ./) is passed to cmd. File names themselves on most file systems have a much lower limit (NAME_MAX) than PATH_MAX, so the ENAMETOOLONG error is less likely to be encountered.

Bytes vs characters

Also, often overlooked when considering security around find and more generally with handling file names in general is the fact that on most Unix-like systems, file names are sequences of bytes (any byte value but 0 in a file path, and on most systems (ASCII based ones, we'll ignore the rare EBCDIC based ones for now) 0x2f is the path delimiter).

It's up to the applications to decide if they want to consider those bytes as text. And they generally do, but generally the translation from bytes to characters is done based on the user's locale, based on the environment.

What that means is that a given file name may have different text representation depending on the locale. For instance, the byte sequence 63 f4 74 e9 2e 74 78 74 would be côté.txt for an application interpreting that file name in a locale where the character set is ISO-8859-1, and cєtщ.txt in a locale where the charset is IS0-8859-5 instead.

Worse. In a locale where the charset is UTF-8 (the norm nowadays), 63 f4 74 e9 2e 74 78 74 simply couldn't be mapped to characters!

find is one such application that considers file names as text for its -name/-path predicates (and more, like -iname or -regex with some implementations).

What that means is that for instance, with several find implementations (including GNU find on GNU systems²).

find . -name '*.txt'

would not find our 63 f4 74 e9 2e 74 78 74 file above when called in a UTF-8 locale as * (which matches 0 or more characters, not bytes) could not match those non-characters.

LC_ALL=C find... would work around the problem as the C locale implies one byte per character and (generally) guarantees that all byte values map to a character (albeit possibly undefined ones for some byte values).

Now when it comes to looping over those file names from a shell, that byte vs character can also become a problem. We typically see 4 main types of shells in that regard:

The ones that are still not multi-byte aware like dash. For them, a byte maps to a character. For instance, in UTF-8, côté is 4 characters, but 6 bytes. In a locale where UTF-8 is the charset, in
```
 find . -name '????' -exec dash -c '
   name=${1##*/}; echo "${#name}"' sh {} \;
```
find will successfully find the files whose name consists of 4 characters encoded in UTF-8, but dash would report lengths ranging between 4 and 24.
yash: the opposite. It only deals with characters. All the input it takes is internally translated to characters. It makes for the most consistent shell, but it also means it cannot cope with arbitrary byte sequences (those that don't translate to valid characters). Even in the C locale, it can't cope with byte values above 0x7f.
```
 find . -exec yash -c 'echo "$1"' sh {} \;
```
in a UTF-8 locale will fail on our ISO-8859-1 côté.txt from earlier for instance.
Those like bash or zsh where the multi-byte support has been progressively added. Those will fall back to considering bytes that can't be mapped to characters as if they were characters. They still have a few bugs here and there especially with less common multi-byte charsets like GBK or BIG5-HKSCS (those being quite nasty as many of their multi-byte characters contain bytes in the 0-127 range (like the ASCII characters)).
Those like the sh of FreeBSD (11 at least) or mksh -o utf8-mode that support multi-bytes but only for UTF-8.

Interrupted output

Another problem with parsing the output of find or even find -print0 may arise if find is interrupted, for instance because it has triggered some limit or was killed for whatever reason.

Example:

$ (ulimit -t 1; find / -type f -print0 2> /dev/null) | xargs -r0 printf 'rm -rf "%s"\n' | tail -n 2
rm -rf "/usr/lib/x86_64-linux-gnu/guile/2.2/ccache/language/ecmascript/parse.go"
rm -rf "/usr/"
zsh: cpu limit exceeded (core dumped)  ( ulimit -t 1; find / -type f -print0 2> /dev/null; ) |
zsh: done                              xargs -r0 printf 'rm -rf "%s"\n' | tail -n 2

Here, find was interrupted because it reached the CPU time limit. Since the output is buffered (as it goes to a pipe), find had output a number of blocks to stdout and the end of the last block it had written at the time it was killed happened to be in the middle of some /usr/lib/x86_64-linux-gnu/guile... file path, here unfortunately just after the /usr/.

xargs, just saw a non-delimited /usr/ record followed by EOF and passed that to printf. If the command had been rm -rf instead, it could have had severe consequences.

Notes

¹ For completeness, we could mention a hacky way in zsh to loop over files using recursive globbing without storing the whole list in memory:

process() {
  something with $REPLY
  false
}
: **/*(ND.m-1+process)

+cmd is a glob qualifier that calls cmd (typically a function) with the current file path in $REPLY. The function returns true or false to decide if the file should be selected (and may also modify $REPLY or return several files in a $reply array). Here we do the processing in that function and return false so the file is not selected.

² GNU find uses the system's fnmatch() libc function to do the pattern matching, so the behaviour there depends on how that function copes with non-text data.

Is the bottom line that the correct way is to use find . -exec something with {} \; or find . -print0 | xargs -r0 something with? And in either case, do I need to write a function/script, called in something with, that contains whatever lines of code I would have included in the for loop? Thanks. — Josh, Commented Apr 11, 2020 at 20:05

Jeff Schaller · Accepted Answer · 2018-02-11 15:09:22Z

This answer is for very large result sets and concerns performance mainly, for example when getting a list of files over a slow network. For small amounts of files (say a few 100 or maybe even 1000 on a local disk) most of this is moot.

Parallelism and memory usage

Aside from the other answers given, related to separation problems and such, there is another issue with

for file in `find . -type f -name ...`; do smth with ${file}; done

The part inside the backticks has to be evaluated fully first, before being split on the linebreaks. This means, if you get a huge amount of files, it may either choke on whatever size limits are there in the various components; you may run out of memory if there are no limits; and in any case you have to wait until the whole list has been output by find and then parsed by for before even running your first smth.

The preferred unix way is to work with pipes, which are inherently running in parallel, and which also do not need arbitrarily huge buffers in general. That means: you would much prefer for the find to run in parallel to your smth, and only keep the current file name in RAM while it hands that off to smth.

One at least partly OKish solution for that is the aforementioned find -exec smth. It removes the need to keep all the file names in memory and runs nicely in parallel. Unfortunately, it also starts one smth process per file. If smth can only work on one file, then that's the way it has to be.

If at all possible, the optimal solution would be find -print0 | smth, with smth being able to process file names on its STDIN. Then you only have one smth process no matter how many files there are, and you need to buffer only a small amount of bytes (whatever intrinsic pipe buffering is going on) between the two processes. Of course, this is rather unrealistic if smth is a standard Unix/POSIX command, but might be an approach if you are writing it yourself.

If that is not possible, then find -print0 | xargs -0 smth is, likely, one of the better solutions. As @dave_thompson_085 mentioned in the comments, xargs does split up the arguments across multiple runs of smth when system limits are reached (by default, in the range of 128 KB or whatever limit is imposed by exec on the system), and has options to influence how many files are given to one call of smth, hence finding a balance between number of smth processes and initial delay.

EDIT: removed the notions of "best" - it is hard to say whether something better will crop up. ;)

find -print0 | xargs smth doesn't work at all, but find -print0 | xargs -0 smth (note -0) or find | xargs smth if filenames don't have whitespace quotes or backslash runs one smth with as many filenames as available and fit in one argument list; if you exceed maxargs, it runs smth as many times as needed to handle all the args given (no limit). You can set smaller 'chunks' (thus somewhat earlier parallelism) with -L/--max-lines -n/--max-args -s/--max-chars. — dave_thompson_085, Commented Nov 8, 2016 at 1:19
Related: Recursive grep vs find / -type f -exec grep {} \; Which is more efficient/faster? — Stéphane Chazelas, Commented Nov 8, 2016 at 9:16

steve · Accepted Answer · 2016-11-07 18:28:50Z

5

One reason is that whitespace throws a spanner in the works, making file 'foo bar' get evaluated as 'foo' and 'bar'.

$ ls -l
-rw-rw-r-- 1 ec2-user ec2-user 0 Nov  7 18:24 foo bar
$ for file in `find . -type f` ; do echo filename $file ; done
filename ./foo
filename bar
$

Works ok if -exec used instead

$ find . -type f -exec echo filename {} \;
filename ./foo bar
$ find . -type f -exec stat {} \;
  File: ‘./foo bar’
  Size: 0               Blocks: 0          IO Block: 4096   regular empty file
Device: ca01h/51713d    Inode: 9109        Links: 1
Access: (0664/-rw-rw-r--)  Uid: (  500/ec2-user)   Gid: (  500/ec2-user)
Access: 2016-11-07 18:24:42.027554752 +0000
Modify: 2016-11-07 18:24:42.027554752 +0000
Change: 2016-11-07 18:24:42.027554752 +0000
 Birth: -
$

answered Nov 7, 2016 at 18:28

steve

22.1k5 gold badges50 silver badges77 bronze badges

Especially in the case of find since there's an option to execute a command on every file it's easily the best option.
– Centimane
Commented Nov 7, 2016 at 18:35
1

Also consider -exec ... {} \; versus -exec ... {} +
– thrig
Commented Nov 7, 2016 at 18:51
1

if you use for file in "$(find . -type f)" and echo "${file}" then it works even with whitespaces, other special characters i guess cause more trouble though
– magor
Commented Nov 7, 2016 at 18:51
10

@mazs - no, quoting doesn't do what you think. In a directory with several files try for file in "$(find . -type f)";do printf '%s %s\n' name: "${file}";done which should (according to you) print each file name on a separate line preceded by name:. It doesn't.
– don_crissti
Commented Nov 7, 2016 at 19:34

Add a comment |

Jan Kyu Peblik · Accepted Answer · 2016-11-11 09:45:42Z

2

Looping over find's output is not bad practice—what's bad practice (in this & all situations) is assuming your input is a particular format instead of knowing (testing & confirming) it's a particular format.

tldr/cbf: find | parallel stuff

answered Nov 11, 2016 at 9:45

Jan Kyu Peblik

1,8391 gold badge10 silver badges7 bronze badges

Add a comment |

user2394284 · Accepted Answer · 2016-11-12 14:32:52Z

2

Because the output of any command is a single string, but your loop needs an array of strings to loop over. The reason it "works" is that shells betrayingly split the string on whitespace for you.

Secondly, unless you need a particular feature of find, be aware that your shell most likely already can expand a recursive glob pattern all by itself, and crucially, that it will expand to a proper array.

Bash example:

shopt -s nullglob globstar
for i in **
do
    echo «"$i"»
done

Same in Fish:

for i in **
    echo «$i»
end

If you do need the features of find, make sure to only split on NUL (such as the find -print0 | xargs -r0 idiom).

Fish can iterate NUL delimited output. So this one is actually not bad:

find -print0 | while read -z i
    echo «$i»
end

As a last little gotcha, in many shells (not Fish of course), looping over command output will make the loop body a subshell (meaning you can't set a variable in any way that is visible after the loop terminates), which is never what you want.

edited Nov 12, 2016 at 14:32

answered Nov 12, 2016 at 14:17

user2394284

3721 gold badge5 silver badges10 bronze badges

@don_crissti Precisely. It doesn't generally work. I was trying to be sarcastic by saying that it "works" (with quotes).
– user2394284
Commented Nov 12, 2016 at 14:53
Note that recursive globbing originated in zsh in the early 90s (though you'd need **/* there). fish like earlier implementations of bash's equivalent feature follows symlinks when descending the directory tree though. See The result of ls * , ls ** and ls *** for the differences between the implementations.
– Stéphane Chazelas
Commented Nov 15, 2016 at 12:31

Add a comment |

muru · Accepted Answer · 2019-12-13 06:46:53Z

What's the proper way to run one or more commands for each file name/path returned by find?

I'm on a mac using zsh. I found this question while searching for how to get the results of the fd command (piping to fzf) into an array. Specifically in a way that I did not have to worry about spaces in filenames being stored as separate elements. With this array I then send those filenames to another script, one at a time.

I tried to vote up Stephane's answer because it gave me the detail I needed to find my answer - which for me was to use this:

array_name=(${(0)"$(fd "\.tar\.gz$|\.tgz$|\.zip$|\.tar.bz2|\.tar$" | fzf --print0)"})

I should probably better understand what the (${(0)...}) piece is doing. Maybe its tied to the pointing out of being able to use IFS=$'\0'. I tried doing a quick search and what I could find was regarding the \0. I found that here: in the zsh documentation on Expansion, where it mentions:

14.3.1 Parameter Expansion Flags

If the opening brace is directly followed by an opening parenthesis, the string up to the matching closing parenthesis will be taken as a list of flags. In cases where repeating a flag is meaningful, the repetitions need not be consecutive; for example, ‘(q%q%q)’ means the same thing as the more readable ‘(%%qqq)’. The following flags are supported:

[...]
0
Split the result of the expansion on null bytes. This is a shorthand for ‘ps:\0:’.

Why is looping over find's output bad practice?

I'm the last person that should try and give you advice on that even after a few years of bash scripting. :) But what I have been trying to do less of is put the rc from a command that into a variable and then test the variable. Instead I try and use these types of if statements:

if echo “${something}” | grep '^s’ -q; then {
:
} elif [[ $(dirname \""${file}\"") == "." ]]; then {

I hope this helps you/someone.

user unknown · Accepted Answer · 2021-03-15 19:26:12Z

Looping over find's output is often bad practice, especially with general purpose scripts, where you don't know much about the circumstances, it is used.

In my daily work, I often have enough knowledge about the files below one subdirectory to exclude the possibility, that a toxic filename might be hit. And I never use filenames with blanks, newlines, tabs and such in their names on my own. It's imho a bad idea in the first place, begging for problems.

For example, I have a script to shrink images from my camera, which are generic and will never change. Their path does not contain blanks, nor the filenames. (IMG00001.JPG) If I get a new camera, I will need to change the script though.

For ad hoc commands, issued and forgotten, it's not a problem. For single user machines, if you don't share your code, it shouldn't be a problem too.

When giving advice on SO, it's often a problem when not communicating the pitfalls and assumptions you make, because other people might have a toxic filename.

Considering Linux's Gnu-find, it has 4 Options, to perform commands on files, like

find . -type f -name ... -exec do-smth-with {} ";"

and besides -exec, there is -execdir, -ok and -okdir for similar purposes. The ~dir-Versions perform the action from the directory, the file is met, in contrast to your current dir, which is recommended. The ok~-Versions ask for confirmation to perform the command.

So in allmoost all cases, you don't need a for-loop. Find iterates by itself over the results and handles blanks and the like in filenames on its own.

Terminating the command with ";" can be replaced by a plus sign without quotes, if the command, called, handles a big bunch of files as parameters gracefully.

After execution, you may even pass more options to find, like:

find -name "*.html" -execdir wc {} + -ls

to give a simple example.

I can't testify for other implementations of find.

Stack Exchange Network

Why is looping over find's output bad practice?

8 Answers 8

POSIX specified uses of `find`

`find` in combination with `sh`

The problem

More correct alternatives

Other reliability/security considerations

Race conditions

Bytes vs characters

Interrupted output

Notes

Parallelism and memory usage

14.3.1 Parameter Expansion Flags

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
files
find
filenames
for
.

Linked

Hot Network Questions

8 Answers 8

find in combination with sh

The problem

More correct alternatives

Other reliability/security considerations

Race conditions

Bytes vs characters

Interrupted output

Notes

Parallelism and memory usage

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged filesfindfilenamesfor.

Linked

Related

`find` in combination with `sh`

Not the answer you're looking for? Browse other questions tagged
files
find
filenames
for
.