I was reading several guides how combine btrfs snapshots with rsync to make an efficient backup solution with history. However it all depends on if rsync --inplace modifies only those portions of files that actually changed, or if it overwrites the whole file sequentially. If it writes the whole file then it seems that btrfs will always create a new copy of the file, which would make the idea much less efficient.
- 3,031
- 7
- 27
- 45
-
How would it even *know* if it can avoid writing to the entire file? Doesn't it need to *read* the entire file first, to figure out what has changed? – user541686 Apr 01 '13 at 03:16
-
3@Mehrdad yes, it does, but reading the whole isn't a problem. If `rsync` reads the whole file and then seeks to and updates only those parts that are needed, btrfs will copy only these updated blocks. But if `rsync` reads _and_ writes the whole file, then it'll be a problem. – Petr Apr 01 '13 at 14:36
-
2@Mehrdad `rsync` does not only know that it may avoid writing the entire file, it manages to do so _without_ copying it completely over the net. Clever little program. – Gunther Piez May 02 '13 at 11:24
-
@PetrPudlák id does not "read" the file, this would be inefficient. It separates the files in chunks, applies a quick hash compares the hashes and transmits what's different. There is also a second more in depth comparision and the server keeps track of the chuncks, but it's not a real "reading" as in loading the whole thing into memory: https://rsync.samba.org/~tridge/phd_thesis.pdf So, as Gunther Piez commented, it does know exactly what to copy. – runlevel0 Feb 14 '20 at 12:38
-
Hopefully this info is useful for the historic record, it doesn't feel like a full answer. I had the question formulated as: Can `rsync` copy only the changed regions in files? Here is a quote from Andrew Tridgell the co-developer of `rsync` writing about the `rsync` algorithm: "The end result is that β gets a copy of _A_, but only the pieces of _A_ that are not found in _B_ (plus a small amount of data for checksums and block indexes) are sent over the link." cite: [The rsync algorithm](https://rsync.samba.org/tech_report/node2.html) – Kyle Feb 26 '22 at 23:25
-
Unless heuristics (e.g. timestamp, file size) skip the whole file, the local file is read by `rsync` fully every time. The remote end will also read the whole file and the local and remote `rsync` syncronize information about the results in both ends and transfer only the changes. Check the `man rsync` for details: https://linux.die.net/man/1/rsync – if something is inserted in the middle of the source file compared to local target, using `--inplace` will *reduce* syncronization performance because `rsync` is not smart enough to reuse the end of file before overwriting it. – Mikko Rantalainen Sep 16 '22 at 07:23
7 Answers
TL;DR - You need option --inplace and if copying between local filesystems you also need --no-whole-file.
If you pass rsync two local paths, it will default to using "--whole-file", and not delta-transfer. Rsync assumes that it can write a completely new file and unlink the old one faster than reading both and calculating the changed blocks. If it doesn't calculate the changes, there won't be block-level changes for btrfs to observe. So, what you're looking for is --no-whole-file, in addition to --inplace. You also get delta-transfer if you requested '-c'.
Here's how you can verify:
$ mkdir a b
$ dd if=/dev/zero of=a/1 bs=1k count=64
$ dd if=/dev/zero of=a/2 bs=1k count=64
$ dd if=/dev/zero of=a/3 bs=1k count=64
$ rsync -av a/ b/
sending incremental file list
./
1
2
3
sent 196831 bytes received 72 bytes 393806.00 bytes/sec
total size is 196608 speedup is 1.00
Then touch a file and re-sync
$ touch a/1
$ rsync -av --inplace a/ b/
sending incremental file list
1
sent 65662 bytes received 31 bytes 131386.00 bytes/sec
total size is 196608 speedup is 2.99
You can verify it re-used the inode with "ls -li", but notice it sent a whole 64K bytes. Try again with --no-whole-file
$ touch a/1
$ rsync -av --inplace --no-whole-file a/ b/
sending incremental file list
1
sent 494 bytes received 595 bytes 2178.00 bytes/sec
total size is 196608 speedup is 180.54
Now you've only sent 494 bytes. You could use strace to further verify if any of the file was written, but this shows it at least used delta-transfer.
Note (see comments) that for local filesystems, --whole-file is assumed (see the man page for rsync). On the other hand, across a network --no-whole-file is assumed, so --inplace on its own will behave as --inplace --no-whole-file.
- 683
- 1
- 6
- 9
-
-
-
2@Geremia not if both paths are local. And my example shows that ```--inplace``` does not imply ```--no-whole-file``` for the version of rsync I was using in 2013, but you're welcome to repeat this experiment with your own version of rsync. – dataless Sep 09 '16 at 17:28
-
Well, `inplace` is not about ‚scanning for same/differing blocks’, it's just about overwriting the existing file right away, from offset 0. (otherweise a temporary copy is built up, and only then the old target file delete and the tempopary copy renamed. It's probably deemed “safer” to keep the old file as long as possible, if the process gets interrupted. Of course this is worse for performance, peak storage consumption (think large files), possibly fragmentation...)... – Frank N Feb 27 '17 at 09:47
-
1I would assume, that it's other way round, `--no-whole-file` always implies `--inplace`, otherwise most of its performance gain would be gone. Couldn't find this documented, though... – Frank N Feb 27 '17 at 09:48
-
When I try it across two different btrfs filesystems on the same machine, I find that `--no-whole-file` and `--inplace --no-whole-file` transfer as quickly, but the former uses a new inode whereas the latter does not. `--inplace` on its own uses the same inode, but copies a whole 64kB. – Diagon Dec 10 '17 at 03:15
-
When I try it across a network from a btrfs filesystem to a remote xfs filesystem, all three options copy only a couple of hundred bytes, and `--no-whole-file` changes the inode, but the other two options that include `--inplace` do not. – Diagon Dec 10 '17 at 03:28
-
This answer is not related to block level change tracking in snapshots, as per the question – Patrick Jun 19 '19 at 02:07
-
The `--no-whole-file` only forces fully local execution to use the same logic as over the network. If you assume that local filesystem has about the same performance for reading and writing, this doesn't make sense because `rsync` would need to fully read both source and target files and only then write changes. It's assumed that it will be faster to read one version and write another version unless this flag is used. If write performance is a lot worse (e.g. `btrfs`) then using `--no-whole-file` may make sense even for local copies. The `--inplace` only changes how writing is done. – Mikko Rantalainen Sep 16 '22 at 07:28
-
@Diagon Yes, --inplace is the most important option, and --no-whole-file makes it work for local filesystem transfers as well. – dataless Sep 16 '22 at 21:45
-
@Patrick This answer is exactly related to block level change tracking because if rsync decides to use whole-file transfer, a completely new file is written and block-level changes do not occur. – dataless Sep 16 '22 at 21:46
-
1@MikkoRantalainen My answer was in the context of the original poster saying they already used --inplace. Both options are required, for local transfers to perform block-level changes to the existing file. I have updated my answer to clarify this. – dataless Sep 16 '22 at 21:49
Here the definite answer I guess, citing the correct part of the manual:
--inplace
[...]
This option is useful for transferring large files
with block-based changes or appended data, and
also on systems that are disk bound, not network
bound. It can also help keep a copy-on-write
*************
filesystem snapshot from diverging the entire con‐
*******************
tents of a file that only has minor changes.
- 191
- 1
- 3
-
Note that it totally depends on copy-on-write filesystem implementation if using this flag is a positive or negative for the performance. Also note that if the source file has been modified by *inserting* data in the middle, using `--inplace` will *reduce* performance of `rsync`. – Mikko Rantalainen Sep 16 '22 at 07:30
rsync's delta transfer algorithm deals with whether the entire file is transmitted or just the parts that differ. This is the default behavior when rsyncing a file between two machines to save on bandwidth. This can be overriden with the --whole-file (or -W) to force rsync to transmit the entire file.
--inplace deals with whether rsync, during the transfer, will create a temporary file or not. The default behavior is to create a temporary file. This gives a measure of safety in that if the transfer is interrupted, the existing file in the destination machine remains intact/untouched. --inplace overrides this behavior and tells rsync to update the existing file directly. With this, you run the risk of having an inconsistent file in the destination machine if the transfer is interrupted.
--inplace overwrites only regions that have changed. Always use it when writing to Btrfs.
- 537
- 5
- 10
-
1And do you have an evidence that shows it doesn't overwrite other parts of files? – Petr Oct 25 '13 at 13:00
-
-
-
1
From the man page:
This option changes how rsync transfers a file when its data
needs to be updated: instead of the default method of creating a
new copy of the file and moving it into place when it is com-
plete, rsync instead writes the updated data directly to the
destination file.
This leads me to believe that it writes over the file in its entirety-- I imagine it would be near impossible for rsync to work any other way.
-
2After determining what parts need update, it could just [seek](https://en.wikipedia.org/wiki/Fseek#fseek) to those parts and update them, instead of writing the entire file. – Petr Apr 01 '13 at 14:39
I believe btrfs-sync could be what you need, here's an article explaining it.
In short, it is a bash script to sync BTRFS snapshots, locally or through SSH*.
The syntax is similar to that of scp
Usage:
btrfs-sync [options] <src> [<src>...] [[user@]host:]<dir>
-k|--keep NUM keep only last <NUM> sync'ed snapshots
-d|--delete delete snapshots in <dst> that don't exist in <src>
-z|--xz use xz compression. Saves bandwidth, but uses one CPU
-Z|--pbzip2 use pbzip2 compression. Saves bandwidth, but uses all CPUs
-q|--quiet don't display progress
-v|--verbose display more information
-h|--help show usage
<src> can either be a single snapshot, or a folder containing snapshots
<user> requires privileged permissions at <host> for the 'btrfs' command
- 111
- 2
The theoretical work on in-place rsync is described in this paper.
Paper reference: D. Rasch and R. Burns. In-Place Rsync: File Synchronization for Mobile and Wireless Devices. USENIX Annual Technical Conference, FREENIX track, 91-100, USENIX, 2003.
From the link:
... We modified the existing rsync implementation to support in-place reconstruction.
Abstract: [...] We have modified rsync so that it operates on space constrained devices. Files on the target host are updated in the same storage the current version of the file occupies. Space-constrained devices cannot use traditional rsync because it requires memory or storage for both the old and new version of the file. Examples include synchronizing files on cellular phones and handheld PCs, which have small memories. The in-place rsync algorithm encodes the compressed representation of a file in a graph, which is then topologically sorted to achieve the in-place property. [...]
So this appears to be the technical details of what rsync --inplace is doing. According to the beginning of the paper:
We have modified rsync so that it performs file synchronization tasks with in-place reconstruction. [...] Instead of using temporary space, the changes to the target file take place in the space already occupied by the current version. This tool can be used to synchronize devices where space is limited.
As becomes clear from @dataless's answer, this implies that --inplace is using the same storage space, but it may still copy the whole file into that space. Specifically, when copies are made from/to local filesystems, rsync assumes the --whole-file option. But when it is across networked systems on the other hand, it assumes the --no-whole-file option.
-
1
-
My apologies. I wasn't paying sufficient attention. With @dataless's answer, this should clear things up. – Diagon Dec 10 '17 at 03:53