Is redirection with `>>` equivalent to `>` when target file doesn't yet exist?

Question

Consider a shell like Bash or sh. The basic difference between > and >> manifests itself in a case when the target file exists:

> truncates the file to zero size, then writes;
>> doesn't truncate, it writes (appends) to the end of the file.

If the file does not exist it is created with zero size; then written to. This is true for both operators. It may seem the operators are equivalent when the target file doesn't yet exist.

Are they really?

Kamil Maciorowski · Accepted Answer · 2018-07-24T18:59:03.223

112

tl;dr

No. >> is essentially "always seek to end of file" while > maintains a pointer to the last written location.

Full answer

^{(Note: all my tests done on Debian GNU/Linux 9).}

Another difference

No, they are not equivalent. There is another difference. It may manifest itself regardless of whether the target file existed before or not.

To observe it, run a process that generates data and redirect to a file with > or >> (e.g. pv -L 10k /dev/urandom > blob). Let it run and change the size of the file (e.g. with truncate). You will see that > keeps its (growing) offset while >> always appends to the end.

If you truncate the file to a smaller size (it can be zero size)
- > won't care, it will write at its desired offset as if nothing happened; just after the truncating the offset is beyond the end of the file, this will cause the file to regain its old size and grow further, missing data will be filled with zeros (in a sparse way, if possible);
- >> will append to the new end, the file will grow from its truncated size.
If you enlarge the file
- > won't care, it will write at its desired offset as if nothing happened; just after changing the size the offset is somewhere inside the file, this will cause the file to stop growing for a while, until the offset reaches the new end, then the file will grow normally;
- >> will append to the new end, the file will grow from its enlarged size.

Another example is to append (with a separate >>) something extra when the data generating process is running and writing to the file. This is similar to enlarging the file.

The generating process with > will write at its desired offset and overwrite the extra data eventually.
The generating process with >> will skip the new data and append past it (race condition may occur, the two streams may get interleaved, still no data should be overwritten).

Example

Does it matter in practice? There is this question:

I'm running a process which produces a lot of output on stdout. Sending it all to a file [...] Can I use some kind of log rotation program?

This answer says the solution is logrotate with copytruncate option which acts like this:

Truncate the original log file in place after creating a copy, instead of moving the old log file and optionally creating a new one.

According to what I wrote above, redirecting with > will make the truncated log large in no time. Sparseness will save the day, no significant disk space should be wasted. Nevertheless each consecutive log will have more and more leading zeros in it that are completely unnecessary.

But if logrotate creates copies without preserving sparseness, these leading zeros will need more and more disk space every time a copy is made. I haven't investigated the tool behavior, it may be smart enough with sparseness or compression on the fly (if compression is enabled). Still the zeros may only cause trouble or be neutral at best; nothing good in them.

In this case using >> instead of > is significantly better, even if the target file is about to be created yet.

Performance

As we can see, the two operators act differently not only when they begin but also later. This may cause some (subtle?) performance difference. For now I have no meaningful test results to support or disprove it, but I think you shouldn't automatically assume their performance is the same in general.

edited Jul 24 '18 at 18:59

answered Jul 23 '18 at 08:42

Kamil Maciorowski

69,815
22
136
202

10

So `>>` is essentially "always seek to end of file" while `>` maintains a pointer to the last written location. Seems that there might be some subtle performance difference in the way they work as well... – Mokubai Jul 23 '18 at 08:53
@Mokubai Well said. I'm going to use your first sentence as tl;dr, if you don't mind. – Kamil Maciorowski Jul 23 '18 at 08:57
No worries, go for it. ;) – Mokubai Jul 23 '18 at 08:58
Where can we find some documented reference about how both `>` and `>>` differ? – jjmontes Jul 23 '18 at 10:51
12

On the system call level, `>>` uses the [`O_APPEND` flag to `open()`](http://man7.org/linux/man-pages/man2/open.2.html). And actually, `>` uses `O_TRUNC`, while `>>` doesn't. The combination of `O_TRUNC | O_APPEND` would also be possible, the shell language just doesn't provide that feature. – ilkkachu Jul 23 '18 at 10:51
4

@jjmontes, the standard source would be POSIX: http://pubs.opengroup.org/onlinepubs/9699919799.2018edition/utilities/V3_chap02.html#tag_18_07 but of course Bash's manual also has descriptions on the redirection operators, including the non-standard ones it supports: https://www.gnu.org/software/bash/manual/html_node/Redirections.html – ilkkachu Jul 23 '18 at 10:53
2

@ilkkachu I found this to be of interest, as it explains details about O_APPEND which I was wondering about after your comment :): https://stackoverflow.com/questions/1154446/is-file-append-atomic-in-unix – jjmontes Jul 23 '18 at 11:10
@KamilMaciorowski the reason I suspect there might be some small performance difference is because the `always seek to EOF` implies there is either a `checkLengthOfFile()` followed by `writeAtLocation(endOfFile)` while the `>` simply does a `writeAtLocation(currentLocation)`. Both would trigger the same checks at the filesystem level to make sure that the file is long enough or needs a sparse expansion, but one needs an extra step to check the length first. I would expect with good disk caching that the difference would be somewhere between trivial and non-existent though... – Mokubai Jul 23 '18 at 12:40
2

@Mokubai, Any sane OS would have the file length at hand when it's open, and checking a flag and moving the offset to the end should just disappear in all the other bookkeeping. Trying to emulate `O_APPEND` with an `lseek()` before each `write()` would be different though, there'd be the extra system call overhead. (And of course it wouldn't work, since another process could `write()` in between.) – ilkkachu Jul 23 '18 at 12:49
1

@ilkkachu I get you. Reading your other comment this is handled all within the filesystem level bits and bobs and is essentially free due to the way the file handle is opened. I admit my comment was from a position of slightly naive and out of date knowledge of reading about how things *used* to be done... – Mokubai Jul 23 '18 at 12:57
@Mokubai Unix/Linux virtual file systems usually implement a `vnode` structure for every open file. BSD's can be seen at [https://man.openbsd.org/vnode.9](https://man.openbsd.org/vnode.9). I haven't examined BSD closely, but I suspect the `void *v_data; /* private data for fs */ ` field in the structure contains the relevant metadata such as current file length and current file offset. Since `write()` operations involve locking the structures, figuring out where to write is a simple "look up the file open flags, then use the current length or offset depending on the value of the flags". – Andrew Henle Jul 23 '18 at 16:28
1

Of course, [logrotate is not the answer _this_ century](http://jdebp.info./FGA/do-not-use-logrotate.html). https://superuser.com/a/868519/38062 https://superuser.com/a/291397/38062 https://superuser.com/a/291397/38062 https://unix.stackexchange.com/a/392924/5132 – JdeBP Jul 25 '18 at 14:10
@AndrewHenle: I believe that you are (at least partly) looking at this backwards. On the one hand, I’m surprised that that the file size isn’t featured clearly in the `vnode` structure; it would seem it’s an attribute common to all filesystem types. But perhaps it is buried under `v_data`. On the other hand, remember that multiple processes can have the same file open simultaneously, *with different file offsets.* The file offset can’t be in the `inode`/`vnode` structure, but must be in the `file` (description) structure; there can be multiple `file` table entries for a single file. … (Cont’d) – Scott - Слава Україні Jul 30 '18 at 02:51
(Cont’d) … I could find only a very old copy of `file.h` [here](https://pages.lip6.fr/Pierre.Sens/srcv6/file.h.html). [This course handout](http://www.cs.rpi.edu/academics/courses/fall04/os/c18/) shows the relationship between the structures. [This code](https://minnie.tuhs.org//cgi-bin/utree.pl?file=4BSD/usr/src/sys/sys/sys2.c) from BSD 4 shows `f_offset` being copied from the `file` structure to `u.u_offset`, and [this](https://minnie.tuhs.org//cgi-bin/utree.pl?file=4BSD/usr/src/sys/sys/rdwri.c) shows ``readi`` and ``writei`` using `u.u_offset` as the offset into the file. … (Cont’d) – Scott - Слава Україні Jul 30 '18 at 03:03
(Cont’d) … P.S. I couldn’t find where `O_APPEND` is handled. – Scott - Слава Україні Jul 30 '18 at 03:03