13

I would like to backup my mail directory on a USB key. However, my IMAP has a strange naming convention that something include a colon (:) character. Since the USB is in a window format, rsync fail to create those file. Is there a way to replace the colon character by an underscore when running rsync? (Or to do the same synchronization with another tool?)

Just a few point that I clarified in the comments :

  • This is a worst case scenario backup, I would like to be able to read it on a windows machine without installing anything.
  • I got a lot of data that stay constant. So I save a lot of time if I have a tool that just copy the newer files.
  • I am not looking for a rewrite of rsync. I am looking for a existing tool that can be used out of the box.

Thanks

Guillaume Coté
  • 3,692
  • 8
  • 30
  • 36

4 Answers4

8

Use rdiff-backup instead of plain rsync. It will automatically detect and substitute for characters that aren't supported on the destination disk, and also put them back as they were when you restore to a unix filesystem. It produces an unpacked directory that looks just like the origin plus one extra metadata directory.

poolie
  • 9,161
  • 4
  • 38
  • 62
7

The most straightforward approach is to leverage the filesystem layer to transform the file names. Since Ubuntu 12.04, there is a FUSE filesystem that transforms file names into names that Windows's VFAT supports: fuse-posixovl Install fuse-posixovl.

sudo mount.posixovl /media/sdb1
chown guillaume /media/sdb1
rsync -au ~/mail /media/sbd1/

Or to avoid requiring root access:

mkdir ~/mnt
/sbin/mount.posixovl -S /media/sdb1 ~/mnt
rsync -au ~/mail ~/mnt/

Characters in file names that VFAT doesn't accept are encoded as %(XX) where XX are hexadecimal digits. As of POSIXovl 1.2.20120215, beware that a file name like %(3A) is encoded as itself, and will be decoded as :, so there is a risk of collision if you have file names containing substrings of the form %(XX).

Beware that POSIXovl does not cope with file names that are too long. If the encoded name doesn't fit in 255 characters, the file can't be stored.

POSIXovl stores unix permissions and ownership in files called .pxovl.FILENAME.


The following bash ≥4 script copies ~/mail/foo:bar to /media/usb99/mail/foo_bar, and similarly for all files under ~/mail. Files that already exist in the destination tree and that are not older than the source are skipped.

#!/bin/bash
set -e
shopt -s dotglob globstar
for source in "$HOME"/mail/**/*; do
  target=/media/usb99/${source#"$HOME"/}
  target=${target//:/_}
  if [[ -d $source ]]; then
    mkdir -p -- "$target"
  elif [[ $target -ot $source ]]; then
    cp -p -- "$source" "$target"
  fi
done

This script works under zsh with minor modifications: replace shopt -s dotglob globstar by setopt dot_glob and [[ $target -ot $source ]] by [[ ! -e $target || $target -ot $source ]].


Here's a zsh two-liner (three if you count the autoloads). It's shorter, but fairly advanced and not very readable.

autoload zargs zmv
zargs -- ~/mail/**/*(/e\''REPLY=/media/usb99/${${REPLY#$HOME/}//:/_}'\') -- mkdir -p --
zmv -C -Q -o -pu '~/mail/(**/)(*)(.)' '/media/usb99/mail/${1//:/_}${2//:/_}'
  • The zargs line is equivalent to mkdir -p ~/mail/**/*(…), except that it won't bomb out if the cumulated length of the directory names are too long. That line creates the target directories as necessary.
  • ~/mail/**/*(/) expands to all the directories under ~/mail (directories only due to the (/) at the end).
  • (/e\''…'\') selects only directories and further executes the code within '…' to transform each file name, which is stored in the REPLY variable.
  • ${${REPLY#$HOME/}//:/_} removes the prefix corresponding with the source directory and changes : into _.
  • zmv -C copies each file matching its first operand (a zsh pattern) to the file name obtained by expandingg its second operand.
  • -o -pu says to pass -pu to the cp utility, so as to preserve permissions and copy only updated files. (We could tell zsh to perform the update check; it would be a little faster but even more cryptic.)
  • (.) selects only regular files. -Q says that this is to be parsed as a glob qualifier and not as a . with parentheses around it indicating a subexpression.
  • $1 and $2 in the replacement text match the parenthesized expressions (**/) and *. (** loses its special meaning as zero or more subdirectory levels if it's in parentheses, unless the parentheses contain exactly **/.)

I initially thought to use pax, which is an archiving tool (here intended to be used in pass-through mode) that has a file renaming feature (its -s option). However, the -s and -u options do not work together (the POSIX definition of pax literally says that -u must check a file of the same name in the destination tree, rather than the file name transformed by -s; the pax implementation in Ubuntu follows the spec literally rather than usefully). It's still possible to make use of it to make renamed hard links, and then copy the hard links (with rsync -au or pax -rw -pp -u) to the other media, but it feels more trouble than it's worth.

cd ~/mail
mkdir -p /media/usb99/mail
pax -rw -l -pp -s '!:!_!g' . ../mail.colonless
rsync -au ../mail.colonless/ /media/usb99/mail/
Gilles 'SO- stop being evil'
  • 59,745
  • 16
  • 131
  • 158
  • I am going to try to use pax. Maybe you could improve your answer by specifying the package needed on ubuntu. It does not seem to be part of the normal installation. – Guillaume Coté Nov 07 '10 at 03:19
  • Just adding a single package called pax make it works. – Guillaume Coté Nov 07 '10 at 03:31
  • It doesn't output anything, I try to adding a -v option to see what is going on. It seem to be copying everything over every time. The point of rsync is to be incremental, which made me save a lot of time when I just have a few modified file in the middle of a lot of files. – Guillaume Coté Nov 07 '10 at 03:37
  • It seem to be creating a 'home' directory under the path I specified. It's recommended to avoid changing directory in script, is there another possibility to avoid the creation of all those directories? – Guillaume Coté Nov 07 '10 at 04:04
  • It took five minute on a second pass where everything was identical. rsync would have took only a few second. Is there an option that need to be specified to make it incremental? – Guillaume Coté Nov 07 '10 at 04:07
  • @Guillaume: `-u` should have done that, but it turns out that it doesn't work with `-s`. At this point, a shell script looks like the easiest option. See my revised answer. – Gilles 'SO- stop being evil' Nov 07 '10 at 15:41
  • Thanks for your help. Your solution is a few order more complex that what I was expecting. I might try to adapt your script as a last resort. But I was hopping for something a single line just as I do for the other rsync. – Guillaume Coté Nov 08 '10 at 22:38
  • @Guillaume: I've added a two-liner using zsh. But it's not simpler than the bash script. I don't think there is anything simpler using existing tools. – Gilles 'SO- stop being evil' Nov 08 '10 at 23:24
  • What is bugging me is not the number of lines, but the complexity associated with it. I don't think having the same complexity in less lines is any improvement. If there is no tool that support it, I could code it myself. I am also concern about the performance if you end up calling cp thousands of time. I tough process creation was slow on linux. – Guillaume Coté Nov 09 '10 at 00:36
  • @Guillaume: I understand your concern about complexity. Fixing pax to usefully support `-u` and `-s` together would give you a simple, one-liner command. Though if you're willing to code, I think the most worthwhile contribution (more generally useful) would be a renaming FUSE filesystem. It would be no more than a few hundred lines of, say, Python, but I wouldn't trust it without significant testing. You'd get a three-line (mount, rsync, umount), but conceptually simple solution. I wouldn't worry too much about process creation time, it's pretty fast on Linux and your task is IO-bound anyway. – Gilles 'SO- stop being evil' Nov 09 '10 at 00:56
  • @GuillaumeCoté Nowadays, you can use posixovl. See my edit. – Gilles 'SO- stop being evil' Jan 12 '14 at 01:29
-1

What I do with my USB memory stick and mobile USB disk is partition them with 2 partitions: a FAT32 one and an ext4 one. The first one I can use to exchange data with non-linux users, the second one for my personal use with my Ubuntu systems (and maybe for exchanging with other linux users). On an ext4 partition, you won't have the ":" problem.

JanC
  • 19,222
  • 4
  • 44
  • 50
  • I would like my backup to be readable anywhere in case I need a information on a windows computer. If not, I would have reformatted the usb key to an unix file system. That's why I am asking about substitution. – Guillaume Coté Nov 06 '10 at 05:41
  • Well, it *is* possible to read at least ext2/ext3 on Windows if you install some tools or filesystem drivers. Do you want to be able to read it on every Windows system, or only on your own systems (where you could install the necessary tools if you needed them)? – JanC Nov 06 '10 at 06:11
  • BTW: in theory it should be possible to store it on an NTFS system too, but most Windows applications (including most from Microsoft) don't support NTFS correctly... :P – JanC Nov 06 '10 at 06:13
  • It is a worst case recovery backup, so I want to be prepare for the case when I need something quick I don't have the right to install anything on a computer. – Guillaume Coté Nov 07 '10 at 03:15
-3

You could use tar to create an archive. This way you don't have to change the names and can save it to whatever filesystem you want.

david
  • 2,410
  • 17
  • 10
  • I could do the same with a zip file or a tgz, but it's going to rewrite the file that have not changed each time. Since the media as a limited number of write and I got several Gib of data, I would like to avoid rewrite the whole thing just because a new 1kb file was added. – Guillaume Coté Nov 05 '10 at 21:40
  • The rewrites on a recent flash memory are "limited" to a number of millions or at least hundreds of thousands usually. I doubt you plan to make that many backups. ;) – JanC Nov 05 '10 at 22:13
  • 2
    -1 -azv does not create an archive, it does an archive mode copy which means it maintains file attributes – João Pinto Nov 05 '10 at 22:42
  • sry, really thought rsync could do that, changed to tar, but I don't know if tar can do incremental backups. However JanC is right and rewrites shouldn't be a problem. – david Nov 06 '10 at 12:47
  • Regarding JanC comments about rewrites, it's not just the fact that there is a limit (last time I check, it was closer to a thousand that a million), but also the fact that I don't like to wait several hours for something that should be less than a minute. – Guillaume Coté Nov 07 '10 at 03:13