In this famous paper is derived the quantum accuracy threshold theorem for concatenated codes.
I have questions for the case errors are assumed to be local and Markovian.
In the so called level-1 simulation for concatenated codes, we basically put error correction inbetween each encoded gate.
Some usefull concepts are the notions of Rec and exRec. A 1-Rec: is an encoded gate followed by quantum error correction. And a 1-exRec is an error correction step followed by an encoded gate, followed again by quantum error correction.
It can be shown that because of quantum error correction, an algorithm will output a wrong answer if there are two fault within an exRec. A fault is defined as a physical gate that did not work as expected.
The probability of such event is upper bounded by: $p=A \epsilon^2$, where $A$ is the number of pair of fault location and $\epsilon$ is the probability that the noisiest physical gate fails.
From that it is possible after further calculation to show the accuracy threshold theorem (introducing the notion of concatenations).
My question
All the reasonning is based on the fact that it is possible to infer a probability that any given gate in the circuit fails. From a classical world I would totally understand the argument and the construction. But the quantum noise is more complicated than being purely probabilistic. And I would like to understand if it is a restricting assumption or if all Markovian quantum noise can be understood with this reasonning.
In the case of "standard" error correction, if you have an initial quantum state, that you make it go through some noise channel, because of error discretization we can make sense of the notion of error probability. Basically, we will have some probability of bit-flip or phase-flip errors due to the noise channel after the syndrome measurement.
In the fault tolerance scenario, if the error correction happened to be perfect (which is not), I could regroup all the set of gates before the error correction and define a "big" noise channel from it. In this scenario it would be like in the previous paragraph and it would make sense to define a probability of failure induced by this noise channel.
However, if I want to relate the properties of this "big" noise channel to the properties of the noise channel of each individual gates. The task might not be easy. Indeed if I define the noise channel of a CPTP $\mathcal{E}$ that tries to implement the ideal unitary $\mathcal{U}$ as the map $\mathcal{N}$ veryfing:
$$\mathcal{E}=\mathcal{N} \circ \mathcal{U}$$
If I have $N$ gates before QEC, I would have:
$$\mathcal{E}_N \circ ... \mathcal{E}_1=\mathcal{N}_N \circ \mathcal{U}_N \circ ... \circ \mathcal{N}_1 \circ \mathcal{U}_1$$
Ahd the noise channel "relevant" for QEC is defined as $ \mathcal{N}_{\text{tot}}$ which verifies:
$$ \mathcal{E}_N \circ ... \mathcal{E}_1 = \mathcal{N}_{\text{tot}} \circ \mathcal{U}_N \circ ... \circ \mathcal{U}_1 $$
Relating $\mathcal{N}_{\text{tot}}$ to the $\{\mathcal{N}_{i}\}$ is not an easy task because among other things of commutation issues.
And this is basically my issue here. In the fault tolerance construction the reasonning consists in some kind of probabilistic reasonning. If one gate "fails" with a probability $\epsilon$, then before QEC, I will have an error (which I can correct). But the quantum noise is non probabilistic, there are all the commutation issues that I just talked for instance. Thus I really don't understand the reasonning if I "think quantum" instead of with classical error probabilities. I could for instance expect some "quantum amplitudes" effects, or "crossed" contribution between all those gates which cannot be understood with classical probabilities.
Thus: how to understand why it is fine to assume that each gate fails with a given probability in FT construction ? How is this probability strictly defined ?