I currently have a FreeNAS box for storing my personal files. I'd like to have an offsite backup, but I'm not willing to spend the money for a second computer capable of running ZFS properly. Therefore I was planning to take the remote backups using rsync
.
I would like all the files in the backup to be consistent, which I thought I could do by taking a recursive snapshot first and then transferring that using rsync
. It turns out however that a separate snapshot is taken for each dataset.
Now I'm wondering if there is any way to view a recursive snapshot, including all the datasets, or whether there is some other recommended way to rsync
an entire zpool
. I don't think simply symlinking to the .zfs
folders in the datasets will work as I'd like rsync
to keep any symlinks that are present in the datasets themselves.
Based on the comments I received, I think some details on my desired configuration are in place. I'm looking to have a NAS at home that I can comfortably put data on, knowing that it's unlikely that I'll ever lose it. For me this means having multiple copies on-site, multiple copies offsite, an offline copy in case things go really bad, periodic snapshots of the data in case of accidental deletion and a means to prevent data errors (e.g. bit rot). The less likely the event is to occur, the more relaxed I am with not having multiple copies of the data after a catastrophe and the less I care about snapshots. Also I care about old data more than I care about new data as I usually have a copy on another device. Finally I should note most files do not get updated too often. Most of the transfers will be new files.
My previous setup was a set of two Raspberry Pi's with attached 4TB external hard drives. I lost trust in this strategy, but had the hardware readily available. After some research it seemed that the only way to prevent errors from sneaking in over time was to go with a checksumming file system such as ZFS combined with server grade components such as ECC RAM and a UPS. For my local copy I went this route. I use 2x4TB disks in mirror and make regular snapshots here.
This machine should cover all cases except for the offsite and offline backups. Since I most likely won't need these backups, I'm not willing to invest too much in it. I therefore figured I could go with the Raspberry Pi's and external disks I already had lying around. I could make it such that one of the disks is always offline, while the other is receiving the backups. Changing the disks at regular intervals would then allow me to have an offline backup of my older data.
The straightforward route would be to use zfs send
and receive
to two pools, one on each disk. The Raspberry Pi, combined with the USB connection to the hard drive, would however not provide zfs
(or any filesystem for that matter) a very reliable environment to operate in. Therefore I'm expecting errors to occur fairly regularly in this setup. Since I'll only be using one disk, zfs
would not have any reliable means to recover from failure.
That is the reason I would like to go with ext3
or ext4
combined with rsync
. Sure, some bad bits might be written to disk. In case of metadata, there are tools to fix most of these issues. In case of data blocks, that would result in the loss of a single file. Also, the file could be recovered using rsync -c
as that would find an incorrect checksum and would transfer the file again from the known-good copy on the local machine. Given the less than ideal hardware, this seems like the best solution possible.
That is my reasoning for using rsync
, which led me to the original question of how to rsync
a recursive zfs snapshot
. If I did not address any of your advice please let me know as I am really open to alternatives. I just do not currently see how they provide any advantage for me.