0
$\begingroup$

"double-double" / "compensated" arithmetic uses unevaluated sums of floating point numbers to obtain higher precision.

One of the basic algorithms is TWOSUM(a,b) = (x,y) where $x,y$ don't overlap each other (while $a,b$ might overlap), with all operations in machine-precision floating point:

x := a + b
z := x - a
y := (a - (x - z)) + (b - z)

I have seen it claimed that this is an "error-free transform". I have not been able to prove it: my attempt by multiplying each operation by $(1 + \epsilon_k)$ in (wx)Maxima yields:

x : (a + b) * (1 + e[1])$
z : (x - a) * (1 + e[2])$
y : ((a - (x - z) * (1 + e[3])) * (1 + e[4]) + (b - z) * (1 + e[5])) * (1 + e[6])$
expand((x + y) - (a + b));
-> -e[3]*a + terms multiplied by more than one e[k]

That is, I think it's only error-free if e[3] is zero, that is, when x - z is exactly representable. Is this always the case? Why?

$\endgroup$
1
  • $\begingroup$ "error free" here means that $(x,y)$ contains all the information for the exact sum $a+b$, $y$ reconstructs the floating point error committed in the calculation of $x$. $\endgroup$ Commented Nov 7, 2023 at 17:31

1 Answer 1

1
$\begingroup$

I found a reference that states a fact without proof:

We shall not give the proof of this fact but point at the leading idea namely that $v_1$ and $u_1$ have been truncated precisely to have only zeros to the right of what are the significant positions of $s$.

-- Møller, O. Quasi double-precision in floating point addition. BIT 5, 37–50 (1965). https://doi.org/10.1007/BF01975722

$s$ corresponds to $x$, $v_1$ corresponds to $z$ and $u_1$ corresponds to $x - z$ in the notation in my question. That $z$ is truncated to $x$ thus means that $x - z$ is exactly representable, and so e[3] is indeed zero, and the difference between $x + y$ and $a + b$ is at most epsilon squared smaller than the value.

$\endgroup$

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .