2

Im reading about the topic if ZFS on Linux can be used for root file system itself and there's an interesting wiki article available. But the partition layout in the example is not clear to me.

# sgdisk     -n1:0:0      -t1:BF01 /dev/disk/by-id/scsi-SATA_disk1
Run this if you need legacy (BIOS) booting:
# sgdisk -a1 -n2:34:2047  -t2:EF02 /dev/disk/by-id/scsi-SATA_disk1
Run this for UEFI booting (for use now or in the future):
# sgdisk     -n3:1M:+512M -t3:EF00 /dev/disk/by-id/scsi-SATA_disk1
Run this in all cases:
# sgdisk     -n9:-8M:0    -t9:BF07 /dev/disk/by-id/scsi-SATA_disk1

What I don't understand in this example is the first partition. Reading through the man page of sgdisk it's not clear to me if the first partition is only of size one block or if it covers essentially the whole device. In the latter case that would mean that all the other created partitions are created within the first one? What makes me wonder is the following sentence in the man page:

A start or end value of 0 specifies the default value, which is the start of the largest available block for the start sector and the end of the same block for the end sector.

"The same block" sounds like one block of the physical device, so either e.g. 512 Byte or 4k Byte of data? Or does it mean some contiguous amount of storage, spanning multiple logical/physical blocks in the end?

It's important to understand this for me because of the following sentence in the wiki:

The root pool does not have to be a single disk; it can have a mirror or raidz topology. In that case, repeat the partitioning commands for all the disks which will be part of the pool. Then, create the pool using zpool create ... rpool mirror /dev/disk/by-id/scsi-SATA_disk1-part1 /dev/disk/by-id/scsi-SATA_disk2-part1 (or replace mirror with raidz, raidz2, or raidz3 and list the partitions from additional disks).

As you can read, only the first partition is put into the pool, the others created above are not mentioned at all. What I'm not understanding is if the other partitions are part of part1 and therefore implicitly available in the pool as well or not used for the pool at all. For example, GRUB is installed into one of the other partitions, no reference to the ZFS pool in this case:

# mkdosfs -F 32 -n EFI /dev/disk/by-id/scsi-SATA_disk1-part3
[...]
# echo PARTUUID=$(blkid -s PARTUUID -o value \
      /dev/disk/by-id/scsi-SATA_disk1-part3) \
      /boot/efi vfat nofail,x-systemd.device-timeout=1 0 1 >> /etc/fstab

That reads to me like part3 is not part of the ZFS pool and therefore used individually. But how does that fit to the size of part1? A partition of only one block doesn't seem to make much sense, while if part3 is contained in part1 somehow, that would mean it is part of the ZFS pool, but addressed without it as well?

I need to understand how redundancy is provided for the boot partitions in a setup with multiple devices and partitions like in the above example. Compared to mdadm, the overall goal is a RAID10 setup, where always two devices are mirrored and the partitions of those devices are then striped over all mirrors. In the end, with 6 disks one would one 3 times the storage for the boot partitions and the root pool. ZFS is capable of doing that in general, I'm just not sure what is part of the pool and if redundancy covers the boot partitions as well.

My current feeling is that it's not the case and I would need mdadm to make the boot partitions redundant. Simply because the GRUB example above accesses some specific disk and partition, a physical device, not some logical mirror created by ZFS. Additionally, that partition is formatted with FAT32 instead of ZFS. This doesn't read like ZFS is caring about any of those partitions at all.

So in the end, does ZFS support a RAID10 setup comparable to that possible with mdadm including all boot partitions, root and such, really everything?

1 Answer 1

2

Each disk (at least two, anyway) will need to have a boot partition that is not part of the ZFS pool to provide redundancy in the event of hardware failure.

In the instructions above it's creating an EFI boot partition as a kind of proactive measure against future changes (EFI boot partitions are FAT small-ish FAT fat file systems that chainload drivers, basically).

In any case, none of those first three partitions is going to belong to the zpool - just the the last (largest) one.

This HOWTO for ZFS on Root for FreeBSD explains it in more detail. (But the different commands may just make it more confusing...)

Consider the following:

  • Your firmware (BIOS, EFI, whatever) knows nothing but how to find a boot
  • There is nothing but JBOD (Just a Bunch Of Disks)

You can't boot from ZFS directly because your firmware doesn't know what ZFS is. So there needs to be a non-ZFS partition the firmware can boot from, and since this won't be protected by ZFS redundancy, it makes sense to have copies of it in many places.

1
  • If none of the first three partitions belong to the zpool, why is -part1 added to it? Shouldn't that be -part9 instead and is simply an error? That's what confusing me, else your explanation fits perfectly to what I've thought already. Thanks. Commented Jun 18, 2017 at 12:16

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .