In computing, there is a standard called IEEE 754. Unlike in mathematics, computers have to fit numbers into a finite amount of space; to do this, they use a format a bit like scientific notation, called a double (or binary64
, in later versions of the specification). It looks a little like this:
$$(-1)^a \times \left(1 + \frac{c}{2^{52}}\right) \times 2^{b-1023}$$
where $a \in \{0,1\}$; $0\le b<2^{11}$ and $0\le c < 2^{52}$.
$a$ is the sign, $b$ is the exponent and $c$ is the significand. It's a bit more complicated than that formula – there's a subnormal representation for numbers close to 0, so that 0 can be represented exactly – but there's not that much to it.
Integers are easy; you just consider all your arithmetic $\mod {2^n}$, and it's a finite field. Floating point arithmetic is harder; there's no guarantee that a + b - b
equals a
. All of the standard arithmetic operations on doubles have well-defined, exact values (for a given rounding mode), so it should be possible to prove things about floating point calculations, but I can't see how floats and doubles would map nicely to the sorts of elegant concepts that proof assistants use.
How do proof assistants represent floating point numbers, and how would I go about proving something like this?
$$ a = 0.1 + 0.3 + 0.3 + 0.3 \\ b = 0.3 + 0.3 + 0.3 + 0.1 \\ a - b = 2^{-53} $$