83

Consider a shell like Bash or sh. The basic difference between > and >> manifests itself in a case when the target file exists:

  • > truncates the file to zero size, then writes;
  • >> doesn't truncate, it writes (appends) to the end of the file.

If the file does not exist it is created with zero size; then written to. This is true for both operators. It may seem the operators are equivalent when the target file doesn't yet exist.

Are they really?

1 Answer 1

113

tl;dr

No. >> is essentially "always seek to end of file" while > maintains a pointer to the last written location.


Full answer

(Note: all my tests done on Debian GNU/Linux 9).

Another difference

No, they are not equivalent. There is another difference. It may manifest itself regardless of whether the target file existed before or not.

To observe it, run a process that generates data and redirect to a file with > or >> (e.g. pv -L 10k /dev/urandom > blob). Let it run and change the size of the file (e.g. with truncate). You will see that > keeps its (growing) offset while >> always appends to the end.

  • If you truncate the file to a smaller size (it can be zero size)
    • > won't care, it will write at its desired offset as if nothing happened; just after the truncating the offset is beyond the end of the file, this will cause the file to regain its old size and grow further, missing data will be filled with zeros (in a sparse way, if possible);
    • >> will append to the new end, the file will grow from its truncated size.
  • If you enlarge the file
    • > won't care, it will write at its desired offset as if nothing happened; just after changing the size the offset is somewhere inside the file, this will cause the file to stop growing for a while, until the offset reaches the new end, then the file will grow normally;
    • >> will append to the new end, the file will grow from its enlarged size.

Another example is to append (with a separate >>) something extra when the data generating process is running and writing to the file. This is similar to enlarging the file.

  • The generating process with > will write at its desired offset and overwrite the extra data eventually.
  • The generating process with >> will skip the new data and append past it (race condition may occur, the two streams may get interleaved, still no data should be overwritten).

Example

Does it matter in practice? There is this question:

I'm running a process which produces a lot of output on stdout. Sending it all to a file [...] Can I use some kind of log rotation program?

This answer says the solution is logrotate with copytruncate option which acts like this:

Truncate the original log file in place after creating a copy, instead of moving the old log file and optionally creating a new one.

According to what I wrote above, redirecting with > will make the truncated log large in no time. Sparseness will save the day, no significant disk space should be wasted. Nevertheless each consecutive log will have more and more leading zeros in it that are completely unnecessary.

But if logrotate creates copies without preserving sparseness, these leading zeros will need more and more disk space every time a copy is made. I haven't investigated the tool behavior, it may be smart enough with sparseness or compression on the fly (if compression is enabled). Still the zeros may only cause trouble or be neutral at best; nothing good in them.

In this case using >> instead of > is significantly better, even if the target file is about to be created yet.


Performance

As we can see, the two operators act differently not only when they begin but also later. This may cause some (subtle?) performance difference. For now I have no meaningful test results to support or disprove it, but I think you shouldn't automatically assume their performance is the same in general.

15
  • 10
    So >> is essentially "always seek to end of file" while > maintains a pointer to the last written location. Seems that there might be some subtle performance difference in the way they work as well...
    – Mokubai
    Commented Jul 23, 2018 at 8:53
  • 12
    On the system call level, >> uses the O_APPEND flag to open(). And actually, > uses O_TRUNC, while >> doesn't. The combination of O_TRUNC | O_APPEND would also be possible, the shell language just doesn't provide that feature.
    – ilkkachu
    Commented Jul 23, 2018 at 10:51
  • 4
    @jjmontes, the standard source would be POSIX: pubs.opengroup.org/onlinepubs/9699919799.2018edition/utilities/… but of course Bash's manual also has descriptions on the redirection operators, including the non-standard ones it supports: gnu.org/software/bash/manual/html_node/Redirections.html
    – ilkkachu
    Commented Jul 23, 2018 at 10:53
  • 2
    @ilkkachu I found this to be of interest, as it explains details about O_APPEND which I was wondering about after your comment :): stackoverflow.com/questions/1154446/…
    – jjmontes
    Commented Jul 23, 2018 at 11:10
  • 2
    @Mokubai, Any sane OS would have the file length at hand when it's open, and checking a flag and moving the offset to the end should just disappear in all the other bookkeeping. Trying to emulate O_APPEND with an lseek() before each write() would be different though, there'd be the extra system call overhead. (And of course it wouldn't work, since another process could write() in between.)
    – ilkkachu
    Commented Jul 23, 2018 at 12:49

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .