4

First: I'm perfectly okay with accepting that this is the case for now and am not looking for an immediate solution, rather I'm trying to understand the technical limitation for this constraint.

I'm working primarily with ZFS on Linux, but my understanding is that all FOSS ZFS development is rooted in OpenZFS by now, so information of any/all of its variants is appreciated.

The man page of zfs remove states:

Top-level vdevs can only be removed if the primary pool storage does not contain a top-level raidz vdev, all top-level vdevs have the same sector size, and the keys for all encrypted datasets are loaded.

I understand and/or can guess the reasons for most of these restrictions, but I don't really understand why the mere presence of a raidz vdev prevents removal of any (even a mirrored or non-redundant) vdevs.

It was my understanding/assumption that from the pool perspective each vdev acts as a "dumb block device" with the actual redundancy/mirroring happening on the vdev level (as suggested by the repeated warning that there is no redundancy at the pool level: all redundancy must exist at the vdev level and a single vdev going bad takes the whole pool down).

Under that assumption it shouldn't matter what specific data vdev is removed, let alone the presence of a "bad" (raidz) vdev in the pool.

Clearly that assumption (or some other one that I can't think of) is wrong. Can someone enlighten me on what?

The only guess I have left that I haven't been able to verify is that there is no absolute reason why raidz vdevs would prevent vdev removal, but that some interaction of some raidz-specific operation and device removal is simply not implemented/tested/verified at this point.

1 Answer 1

3
+100

Data inside a RAIDZ device are striped differently than on a single or mirror vdev. Removing a lone single (or mirror) vdev really means to create an hidden indirect device which contains a table remapping (redirecting) the old DVA address to a new one, but this require metadata layout to be the same between the removed device and the new one. This simply is not the case when the data are copied to a RAIDZ device.

3
  • Let me see if I understand that by trying to rephrase: for a direct or mirror vdev the mapping table would basically just need a linear offset, but for a raidz it would need a more complex mapping (roughly equivalent to each of the relevant blocks distributed on the various parts of the radiz vdev)? And does that mean that (at least in theory) it would be possible to allow removing raidz vdevs from a raidz-only pool by creating that more complex mapping? Commented Aug 27, 2023 at 21:59
  • 2
    Theoretically - yes, by using a more complex remapping, relocation can succeed even with raidz vdevs in the pool. However, in practice this means changing the metadata blocks, which in turn means changing their checksum stored in other blocks, etc. So it would be a much harder task than the "simple" redirect table currently used.
    – shodanshok
    Commented Aug 27, 2023 at 22:06
  • Oh, and I think I can finally formulate the vague understanding that I got from your answer: the problem is that data that used to be on the "simple" vdev could (likely would) end up on the complex raidz vdev which means there's no simple "map this block to that block" mapping, but a "this block is now stored accross these blocks on these devices". I think this was the step that takes your answer from "yeah, that sounds reasonable" to me actually understanding what you're saying. Commented Aug 28, 2023 at 10:00

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .