0

For RAID 6 in wiki, the extra priority check bit Q is calculated by shifting of bits.

https://en.wikipedia.org/wiki/Standard_RAID_levels#RAID_6

Suppose we would like to distribute our data over n {\displaystyle n} n chunks. Our goal is to define two parity values P {\displaystyle \mathbf {P} } \mathbf {P} and Q {\displaystyle \mathbf {Q} } \mathbf {Q} , known as syndromes, resulting in a system of n + 2 {\displaystyle n+2} n+2 physical drives that is resilient to the loss of any two of them. In order to generate more than a single independent syndrome, we will need to perform our parity calculations on data chunks of size k > 1. {\displaystyle k>1.} {\displaystyle k>1.} A typical choice in practice is a chunk size k = 8 {\displaystyle k=8} {\displaystyle k=8}, i.e. striping the data per-byte. We will denote the base-2 representation of a data chunk D {\displaystyle D} D as d 0 d 1 . . . d k − 1 {\displaystyle d_{0}d_{1}...d_{k-1}} {\displaystyle d_{0}d_{1}...d_{k-1}}, where each d i {\displaystyle d_{i}} d_{i} is either 0 or 1.

...

This system will no longer work applied to a larger number of drives n > k {\displaystyle n>k} {\displaystyle n>k}. This is because if we repeatedly apply the shift operator k {\displaystyle k} k times to a chunk of length k {\displaystyle k} k, we end up back where we started.

In ZFS RAID-Z2 implementation (summarised in a developer blog post, the algorithm is based on H. Peter Anvin's paper, as well as the ZFA RAIDZ source code), should setting data chunk D size to 256 bits, k would be 265 thus allowing 255 different shifts. Then the theoretical maximum number of data drives supported should be 256 drives (no shift + 255 shifts), isn't it?

A maximum is mentioned in H. Peter Anvin's paper, but it says 255, which seems that he left out the no shift case.

Galois field, GF(28), allows for a maximum of 257 drives, 255 (28 − 1) of which can be data drives; the reason for this is shown below.

In addition I do not understand why in shift-2, shift-3, and shift-4 they need to by exclusive-or by x7, while the rest five do not.

1 Answer 1

0

After studying the materials for a few more times, I can confirm that the theoretical maximum of supported vdevs in RAID-Z2 is 255+2 vdevs.

Further reading to the following section in the Wikipedia,

General parity system

It is possible to support a far greater number of drives by choosing the parity function more carefully.

...

The effect of g i {\displaystyle g^{i}} g^{i} can be thought of as the action of a carefully chosen linear feedback shift register on the data chunk.[30] Unlike the bit shift in the simplified example, which could only be applied k {\displaystyle k} k times before the encoding began to repeat, applying the operator g {\displaystyle g} g multiple times is guaranteed to produce m = 2 k − 1 {\displaystyle m=2^{k}-1} {\displaystyle m=2^{k}-1} unique invertible functions, which will allow a chunk length of k {\displaystyle k} k to support up to 2 k − 1 {\displaystyle 2^{k}-1} 2^{k}-1 data pieces.

With a carefully chosen LFSR algorithm, among k-bits, it allows us to produce far more than k unique sequences.

Back to the H. Peter Anvin's paper, the x's in the paper are bits (of a byte) while shifting and XOR 0001 1100 (of x7) produces the next shifted byte. From point 6 through point 12 should be proving that this shifting and XOR 0001 1100 algorithm would produce 255 unique deviations. (However mathematics is not my expertise that I could not understand them.)

The algorithm used in the ZFS RAID source code is based on the H. Peter Anvin's paper, thus RAID-Z2 is sharing the same theoretical maximum of 255 unique sequences.

In C, multiplying by 2 is therefore ((a << 1) ^ ((a & 0x80) ? 0x1d : 0)).

We provide a mechanism to perform the field multiplication operation on a * 64-bit value all at once rather than a byte at a time. This works by * creating a mask from the top bit in each byte and using that to * conditionally apply the XOR of 0x1d.

However neither VDEV_RAIDZ_MUL_2(x) nor VDEV_RAIDZ_MUL_4(x) is used in the source code. Instead it uses a less trivial VDEV_RAIDZ_64MUL_2. This would require further calculation to prove and understand this part.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .