Golang floating point precision float32 vs float64

Question

I wrote a program to demonstrate floating point error in Go:

func main() {
    a := float64(0.2) 
    a += 0.1
    a -= 0.3
    var i int
    for i = 0; a < 1.0; i++ {
        a += a
    }
    fmt.Printf("After %d iterations, a = %e\n", i, a)
}

It prints:

After 54 iterations, a = 1.000000e+00

This matches the behaviour of the same program written in C (using the double type)

However, if float32 is used instead, the program gets stuck in an infinite loop! If you modify the C program to use a float instead of a double, it prints

After 27 iterations, a = 1.600000e+00

Why doesn't the Go program have the same output as the C program when using float32?

I'm not seeing a problem... 0.2 + 0.1 = 0.3, 0.3 - 0.3 = 0.0, loop through 0.0 + 0.0 would never rise above 1.0 What I'm confused about is how you got it to break out of the loop with the float64? — Verran, Commented Mar 11, 2014 at 22:23
floating point numbers are not perfectly accurate. In particular, the numbers 0.1 and 0.3 can not be represented exactly. This causes a to have a non-zero (albeit very small) value before entering the loop. Wikipedia has an explanation. en.wikipedia.org/wiki/Guard_digit — charliehorse55, Commented Mar 11, 2014 at 22:27
I started playing with this playground play.golang.org/p/Im6OFfTFPY, and I kind of see what you mean, but it looks like in Go float32s are represented exactly, while float64s are not — Verran, Commented Mar 11, 2014 at 22:29
If you check the ASM of the code with go tool 6g -S main.go you will see the reason. The calculation for float32 is as follows: 2.00000002980232230e-01 + 1.00000001490116120e-01 - 3.00000011920928950e-01 which is a negative value and will never sum up to 1. Why Go does this, I do not know. — ANisus, Commented Mar 11, 2014 at 22:47
Played around with another playground (play.golang.org/p/FZxCQTS9yG) a little longer and found that when you print the float64 up to 20 decimal places, you get a lot more digits than just 0.30...04, you get 0.30000000000000004440892098500626161694526672363281 and the rest gets cut off. I'm guessing that with a float32, a lot more gets cut off and it gets rounded to an even 0.3. This could explain the arithmetic, but right now its just a theory. — Verran, Commented Mar 11, 2014 at 22:50

Dave C · Accepted Answer · 2017-11-02 15:41:10Z

Using math.Float32bits and math.Float64bits, you can see how Go represents the different decimal values as a IEEE 754 binary value:

Playground: https://play.golang.org/p/ZqzdCZLfvC

Result:

float32(0.1): 00111101110011001100110011001101
float32(0.2): 00111110010011001100110011001101
float32(0.3): 00111110100110011001100110011010
float64(0.1): 0011111110111001100110011001100110011001100110011001100110011010
float64(0.2): 0011111111001001100110011001100110011001100110011001100110011010
float64(0.3): 0011111111010011001100110011001100110011001100110011001100110011

If you convert these binary representation to decimal values and do your loop, you can see that for float32, the initial value of a will be:

0.20000000298023224
+ 0.10000000149011612
- 0.30000001192092896
= -7.4505806e-9

a negative value that can never never sum up to 1.

So, why does C behave different?

If you look at the binary pattern (and know slightly about how to represent binary values), you can see that Go rounds the last bit while I assume C just crops it instead.

So, in a sense, while neither Go nor C can represent 0.1 exactly in a float, Go uses the value closest to 0.1:

Go:   00111101110011001100110011001101 => 0.10000000149011612
C(?): 00111101110011001100110011001100 => 0.09999999403953552

Edit:

I posted a question about how C handles float constants, and from the answer it seems that any implementation of the C standard is allowed to do either. The implementation you tried it with just did it differently than Go.

No need for strconv.FormatUint(x, 2), fmt.Printf has a "%b" format. No need for unsafe, there is math.Float32bits and math.Float64bits. A better version is: play.golang.org/p/ZqzdCZLfvC — Dave C, Commented Nov 2, 2017 at 15:36

aka.nice · Accepted Answer · 2014-03-12 03:05:53Z

Agree with ANisus, go is doing the right thing. Concerning C, I'm not convinced by his guess.

The C standard does not dictate, but most implementations of libc will convert the decimal representation to nearest float (at least to comply with IEEE-754 2008 or ISO 10967), so I don't think this is the most probable explanation.

There are several reasons why the C program behavior might differ... Especially, some intermediate computations might be performed with excess precision (double or long double).

The most probable thing I can think of, is if ever you wrote 0.1 instead of 0.1f in C.
In which case, you might have cause excess precision in initialization
(you sum float a+double 0.1 => the float is converted to double, then result is converted back to float)

If I emulate these operations

float32(float32(float32(0.2) + float64(0.1)) - float64(0.3))

Then I find something near 1.1920929e-8f

After 27 iterations, this sums to 1.6f

I changed the C program to declare all constants with f and now it also stalls. I got the code from wikipedia originally (en.wikipedia.org/wiki/Guard_digit) so I'll go update that code as well. — charliehorse55, Commented Mar 12, 2014 at 3:49

Collectives™ on Stack Overflow

Golang floating point precision float32 vs float64

2 Answers 2

Not the answer you're looking for? Browse other questions tagged
go
floating-point
precision
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Not the answer you're looking for? Browse other questions tagged gofloating-pointprecision or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
go
floating-point
precision
or ask your own question.