rsync oneliner: a study of a complex commandline
It seems silly to make a blog post about this, but I keep on forgetting the answer to "what if I really want to just transfer EVERYTHING with rsync?". Since the rsync(1) manpage is 28,000 words, I basically never go there to find the answer and instead grep around this wiki and find other instances, which are never quite as good as what I've come up with with the help of my (new) colleague weasel.
The common answer is "just use -av
":
rsync -av A/ B/
... but that has a few limitations:
- it shows every file transfered, which can overwhelm the terminal for large transfers
- it won't transfer hardlinks, ACLs and other extended attributes
- it might break if
/etc/passwd
is not synchronized across hosts
The full one liner
The answer, of course, is instead the very intuitive:
rsync -PaSHAXx --numeric-ids --info=progress2 A/ B/
-c
to do a (MD5!?) checksum of the files instead, but that's
much slower. (A better hashing algorithm could be SHA-2 or
Meow, obviously.)
What does it do?
Those flags mean:
-P same as --partial --progress
-a, --archive archive mode; equals -rlptgoD (no -H,-A,-X)
-S, --sparse turn sequences of nulls into sparse blocks
-H, --hard-links preserve hard links
-A, --acls preserve ACLs (implies -p)
-X, --xattrs preserve extended attributes
-x, --one-file-system don't cross filesystem boundaries
--numeric-ids don't map uid/gid values by user/group name
-c, --checksum skip based on checksum, not mod-time & size
-H
is expensive, which is why it's not included in
-a
by default, as the manpage explains.
--sparse
: it does what it says it does: if it
finds a file with nulls in it, it will write those as sparse
blocks, which means you might create sparse blocks where there weren't
any before. There doesn't seem to be a sane way to deal with this.
Unrolling some of those, this actually means:
-r, --recursive recurse into directories
-l, --links copy symlinks as symlinks
-p, --perms preserve permissions
-t, --times preserve modification times
-g, --group preserve group
-o, --owner preserve owner (super-user only)
-D same as --devices --specials
--partial keep partially transferred files
--progress show progress during transfer
And yes, we need to unroll this again:
--devices preserve device files (super-user only)
--specials preserve special files
The --numeric-ids
parameter is really relevant only when you archive
files across servers that might not share the same UID space. This is
especially important when restoring from backups because you might be
creating /etc/passwd
along the way (!).
What's with progress2?
The last bit, --info=progress2
is not directly documented in the
manpage, at least not in the --info
section. Strangely, there's some
information in the -P
flag where it says:
outputs statistics based on the whole transfer, rather than
individual files.
I found this was extremely useful during large transfers because, by
default, -P
(or, more specifically, --progress
) shows progress for
each individual file (only). That's fine if you transfer large
files, but for large transfers (with a large number of files),
that's much less useful and possibly incredibly
noisy. --info=progress2
, according to --info=help
, does instead:
PROGRESS Mention 1) per-file progress or 2) total transfer progress
... which I admit is not much clearer, but basically, it gives you an
overview of the entire transfer. Of course, --progress
and
--info=progress2
overlap with each other, so you will want to
remove the -P
option (and re-add --partial
) to get the clean,
one-line-only output. It looks something like this:
542,054 0% 23.48kB/s 0:00:22 (xfr#4, to-chk=1000/867646)
In the above, you have the following space-separated fields:
- size of the files transfered so far (in bytes, above is around 500KiB)
- the percentage of the known files that represents (zero percent)
- the current transfer rate (23.48 kilobyte per second)
- the time passed so far (22 seconds)
- the number of files transfered so far (4 files)
- the number of files to be transfered over the number of files found so far
The last pair of numbers are confusing: the left side is the number of files remaining to be checked, and the right side is the number of files found so far. both numbers can rise as rsync works incrementally. when the transfer is complete, this will show 0/N, where N is the total number of files found). All this is well explained in this StackExchange post.
Other similar uses
Note that this is similar to how at least one backup system runs its
test suite, against, interestingly, rsync. Indeed, bup uses
rsync to check that the files it restores are identical to the
original. They use the also super-intuitive -niaHAX
(maybe with
-c
), which I find slightly less intuitive than my ordering, which
sounds like "fax"pacha) in french.
Conclusion
So there you go. -PaSHAX
is now your new best friend. And don't
forget the obvious --numeric-ids
(and not uids
, they talk
about groups too) and --info=progress2
(grrr) and maybe
--checksum
if you're nostalgic about the good old MD5 days.
A/
and B/
. Those,
stupidly, matter to rsync. This is one of the most confusing things
about rsync and I have gotten around that problem by always
specifying a trailing slash to both arguments, which gives a
consistent experience all the time. But, if you want to know all the
nasty details, try to figure out this bit:
A trailing slash on the source changes this behavior to avoid creating an additional directory level at the destination. You can think of a trailing / on a source as meaning "copy the contents of this directory" as opposed to "copy the directory by name", but in both cases the attributes of the containing directory are transferred to the containing directory on the destination. In other words, each of the following commands copies the files in the same way, including their setting of the attributes of /dest/foo:
rsync -av /src/foo /dest rsync -av /src/foo/ /dest/foo
They ommitted, obviously, that this is also identical:
rsync -av /src/foo/ /dest/foo/
At this point, I would understand if you want to throw the "fine manual" out the window and yell like crazy.
Other flags
I'll document other exotic flags that I may use sometimes, which may be hard to find.
sync files
This one is particularly hard to find, either on the web or in the
manual page, because both tend to treat the string sync
as basically
being irrelevant (at best, in a web search) or being synonymous to
rsync
(at worst, in the manual page).
The answer is hinted at with:
rsync --help | grep sync
... which has less a tendency of "repeating the string rsync
"
everywhere and gives us the obvious:
--fsync fsync every written file
Useful when you write to an external thumb drive or SD card and know
that Linux will lie to you by queuing up a bunch of writes until after
mount. An alternative is to mount those partitions with -o sync
but
who does that ever.
-PHaAX
) to "pacha(x)" (-PaSHAX
) which still sounds good and is a better mapping to the transliteration...I update the post to add more details about
--info=progress2
. I didn't realize this at first, but it kind of conflicts with the--progress
argument as the latter kind of jumbles up the output of the former.I also added a warning about
--sparse
, which still confuses the hell out of me.Oh, and I added
-x
to avoid crossing filesystems. I generally do that as I often sync filesystems with rsync and don't want to descend in/proc
and so on. You will, of course, want to be careful around that as well if you want to transfer multiple filesystems: just call rsync multiple times.