The statement made by Tom Wong is based on the following reasoning, discussed in other answers on this site (for instance, this one): if there is no entanglement, then the $n$-qubit state is separable into $n$ tensor products of $1$-qubit state. Thus, a classical computer simply has to keep in memory the $2n$ complex amplitudes to simulate the quantum computer, which is efficient. Thus, whatever the quantum computer solves, the classical computer can solve it to with the same complexity.
First of all, note that this reasoning works as long as we're dealing with pure states. As Norbert Schuch mentioned in the comments and as explained in this answer, it is not yet known how to efficiently simulate mixed separable states. This answer shows examples of problems where problems that are considered hard for a classical computer can be tackled using a single pure qubit and a completely mixed state.
The paper you've linked uses this fact and challenges the fact that "if there is no entanglement, then the $n$-qubit state is separable into $n$ tensor products of $1$-qubit state" using the following mixed state:
$$\varepsilon|\psi\rangle\langle\psi|+\frac{1-\varepsilon}{2^n}I_{2^n}$$
with $\varepsilon\in[0\,;\,1]$ and $I_{2^n}$ the identity matrix of size $2^n\times2^n$. In particular, there is a theorem which states that for a very small $\varepsilon$, that is for $\varepsilon$ such that:
$$\varepsilon<\frac{1}{1+2^{2n-1}}$$
then no entanglement appears in the state, even if $|\psi\rangle$ is entangled. They then proceed to show that for the Deutch-Josza problem, this state yields a positive (though small) amount of information within a single query, which is not possible using a single classical call to this function. Thus, they managed to do something (gain information) that classical computers can't do.
Now comes the part where I give my opinion on this. If someone disagrees, please feel free to comment or edit the answer accordingly.
There are two things that bother me in this method.
- Algorithm using this method will never have any advantage (computationally speaking) w.r.t. the classical ones.
- The fact that there's no entanglement comes more from a wordplay than a physical property IMO.
Concerning the first point, using this method the final state of an algorithm using $n$ qubits can always be written as:
$$\varepsilon|\psi\rangle\langle\psi|+\frac{1-\varepsilon}{2^n}I_{2^n}$$
with $|\psi\rangle$ the pure state one would have obtained when starting with a pure state (i.e. the state one would have obtained when using the full potential of quantum computers). Performing a measurement in the computational basis (which is what one would do without loss of generality) yields a basis state $|s\rangle$. Note that each state $s$ has a probability of at least $\frac{1-\varepsilon}{2^n}$ to be chosen. The maximum probability to measure a desirable state $|s\rangle$ is thus $\varepsilon+\frac{1-\varepsilon}{2^n}$. Note that if no entanglement is observed, this means that $\varepsilon$ is exponentially small. Intuitively, the amount of information you gain is also exponentially small. Thus, the algorithm isn't useful in practice, since it cannot solve a problem in sub-exponential time (or it cannot do it with a non-negligible advantage).
For instance, for the Deutsch-Josza problem, the probability of winning using this method (assuming that the probability that $f$ is balanced is $\frac12$) is:
$$\frac12\left(\varepsilon+\frac{1-\varepsilon}{2^n}\right)+\frac12\left(\varepsilon+\frac{2^n-1}{2^n}(1-\varepsilon)\right)=\frac12+\varepsilon$$
with $\varepsilon$ being exponentially small. Recall that the classical computer wins with probability $\frac12$. Thus, though this algorithm can do something classical computers can't without entanglement, there is no algorithm that runs in polynomial time that can distinguish them.
Concerning the second point, it comes in my opinion from the way we define entanglement of mixed states. The more intuitive way to think about the state we're dealing with is:
- With probability $\varepsilon$, one works with the desired, entangled state.
- Or with probability $1-\varepsilon$, one works with a state on which one has no information whatsoever.
Note that this uncertainty is purely classical. The state you're working this is either one of these state, not a superposition of them. Thus, with probability $\varepsilon$, this state is entangled. Note also that the small advantage comes from the entangled part. The fact that this advantage is so small comes from the fact that $\varepsilon$ must be small for this state not be "entangled".
Thus, even though the resulting mixed state is not defined as entangled, I'd still argue that the small difference one can observe between the classical and quantum computers comes from the entanglement of $|\psi\rangle$.