It seems silly to make a blog post about this, but I keep on forgetting the answer to "what if I really want to just transfer EVERYTHING with rsync?". Since the rsync(1) manpage is 28,000 words, I basically never go there to find the answer and instead grep around this wiki and find other instances, which are never quite as good as what I've come up with with the help of my (new) colleague weasel.

The common answer is "just use -av":

rsync -av A/ B/

... but that has a few limitations:

The full one liner

The answer, of course, is instead the very intuitive:

rsync -PaSHAXx --numeric-ids --info=progress2 A/ B/

If you don't trust the filesystem time and files sizes, also throw in -c to do a (MD5!?) checksum of the files instead, but that's much slower. (A better hashing algorithm could be SHA-2 or Meow, obviously.)

What does it do?

Those flags mean:

    -P                          same as --partial --progress
    -a, --archive               archive mode; equals -rlptgoD (no -H,-A,-X)
    -S, --sparse                turn sequences of nulls into sparse blocks
    -H, --hard-links            preserve hard links
    -A, --acls                  preserve ACLs (implies -p)
    -X, --xattrs                preserve extended attributes
    -x, --one-file-system       don't cross filesystem boundaries
        --numeric-ids           don't map uid/gid values by user/group name
    -c, --checksum              skip based on checksum, not mod-time & size

Keep in mind that -H is expensive, which is why it's not included in -a by default, as the manpage explains.

Also be careful around --sparse: it does what it says it does: if it finds a file with nulls in it, it will write those as sparse blocks, which means you might create sparse blocks where there weren't any before. There doesn't seem to be a sane way to deal with this.

Unrolling some of those, this actually means:

    -r, --recursive             recurse into directories
    -l, --links                 copy symlinks as symlinks
    -p, --perms                 preserve permissions
    -t, --times                 preserve modification times
    -g, --group                 preserve group
    -o, --owner                 preserve owner (super-user only)
    -D                          same as --devices --specials
        --partial               keep partially transferred files
        --progress              show progress during transfer

And yes, we need to unroll this again:

        --devices               preserve device files (super-user only)
        --specials              preserve special files

The --numeric-ids parameter is really relevant only when you archive files across servers that might not share the same UID space. This is especially important when restoring from backups because you might be creating /etc/passwd along the way (!).

What's with progress2?

The last bit, --info=progress2 is not directly documented in the manpage, at least not in the --info section. Strangely, there's some information in the -P flag where it says:

outputs statistics based on the whole transfer, rather than
individual files.

I found this was extremely useful during large transfers because, by default, -P (or, more specifically, --progress) shows progress for each individual file (only). That's fine if you transfer large files, but for large transfers (with a large number of files), that's much less useful and possibly incredibly noisy. --info=progress2, according to --info=help, does instead:

PROGRESS   Mention 1) per-file progress or 2) total transfer progress

... which I admit is not much clearer, but basically, it gives you an overview of the entire transfer. Of course, --progress and --info=progress2 overlap with each other, so you will want to remove the -P option (and re-add --partial) to get the clean, one-line-only output. It looks something like this:

        542,054   0%   23.48kB/s    0:00:22 (xfr#4, to-chk=1000/867646)   

In the above, you have the following space-separated fields:

  1. size of the files transfered so far (in bytes, above is around 500KiB)
  2. the percentage of the known files that represents (zero percent)
  3. the current transfer rate (23.48 kilobyte per second)
  4. the time passed so far (22 seconds)
  5. the number of files transfered so far (4 files)
  6. the number of files to be transfered over the number of files found so far

The last pair of numbers are confusing: the left side is the number of files remaining to be checked, and the right side is the number of files found so far. both numbers can rise as rsync works incrementally. when the transfer is complete, this will show 0/N, where N is the total number of files found). All this is well explained in this StackExchange post.

Other similar uses

Note that this is similar to how at least one backup system runs its test suite, against, interestingly, rsync. Indeed, bup uses rsync to check that the files it restores are identical to the original. They use the also super-intuitive -niaHAX (maybe with -c), which I find slightly less intuitive than my ordering, which sounds like "fax"pacha in french.

Conclusion

So there you go. -PaSHAX is now your new best friend. And don't forget the obvious --numeric-ids (and not uids, they talk about groups too) and --info=progress2 (grrr) and maybe --checksum if you're nostalgic about the good old MD5 days.

Notice the trailing slashes at the end of A/ and B/. Those, stupidly, matter to rsync. This is one of the most confusing things about rsync and I have gotten around that problem by always specifying a trailing slash to both arguments, which gives a consistent experience all the time. But, if you want to know all the nasty details, try to figure out this bit:

A trailing slash on the source changes this behavior to avoid creating an additional directory level at the destination. You can think of a trailing / on a source as meaning "copy the contents of this directory" as opposed to "copy the directory by name", but in both cases the attributes of the containing directory are transferred to the containing directory on the destination. In other words, each of the following commands copies the files in the same way, including their setting of the attributes of /dest/foo:

rsync -av /src/foo /dest
rsync -av /src/foo/ /dest/foo

They ommitted, obviously, that this is also identical:

rsync -av /src/foo/ /dest/foo/

At this point, I would understand if you want to throw the "fine manual" out the window and yell like crazy.

update: added -S
On pabs's recommendation, I also added -S, changing the acronym from "fax" (-PHaAX) to "pacha(x)" (-PaSHAX) which still sounds good and is a better mapping to the transliteration...
Comment by anarcat
one more note

I update the post to add more details about --info=progress2. I didn't realize this at first, but it kind of conflicts with the --progress argument as the latter kind of jumbles up the output of the former.

I also added a warning about --sparse, which still confuses the hell out of me.

Oh, and I added -x to avoid crossing filesystems. I generally do that as I often sync filesystems with rsync and don't want to descend in /proc and so on. You will, of course, want to be careful around that as well if you want to transfer multiple filesystems: just call rsync multiple times.

Comment by anarcat
Created . Edited .