How to actually avoid floating point errors when you need to use float?

Question

I am trying to affect the translation of a 3D model using some UI buttons to shift the position by 0.1 or -0.1.

My model position is a three dimensional float so simply adding 0.1f to one of the values causes obvious rounding errors. While I can use something like BigDecimal to retain precision, I still have to convert it from a float and back to a float at the end and it always results in silly numbers that are making my UI look like a mess.

I could just pretty the displayed values but the rounding errors will only get worse with more editing and they make my save files rather hard to read.

So how do I actually avoid these errors when I need to use a float?

Adding and subtracting .1f a few times would not produce huge errors. The most noticeable would be a single change by one, as when the desired result were four but something very slightly less were produced and truncated to three during conversion to integer. That would be fixed by rounding before conversion. If you are getting other errors, something else may be wrong. Please show example data and code illustrating the problems you are having. — Eric Postpischil, Commented Apr 21, 2013 at 21:02

Zim-Zam O'Pootertoot · Accepted Answer · 2013-04-21 16:50:29Z

2

The Kahan summation and pairwise summation algorithms help to reduce floating point errors. Here's some Java code for the Kahan algorithm.

answered Apr 21, 2013 at 16:50

Zim-Zam O'Pootertoot

18.1k4 gold badges43 silver badges70 bronze badges

2

The question reports using float. Changing to double would provide more error reduction than the Kahan summation algorithm and more cheaply (on most hardware). However, neither method would solve the errors that occur from converting very slightly wrong results to integer by truncation.
– Eric Postpischil
Commented Apr 21, 2013 at 21:04

Add a comment |

OldCurmudgeon · Accepted Answer · 2013-04-21 17:02:43Z

1

I would use a Rational class. There are many out there - this one looks like it should work.

One significant cost will be when the Rational is rendered into a float and one when the denominator is reduced to the gcd. The one I posted keeps the numerator and denominator in fully reduced state at all times which should be quite efficient if you are always adding or subtracting 1/10.

This implementation holds the values normalised (i.e. with consistent sign) but unreduced.

You should choose your implementation to best fit your usage.

edited Apr 21, 2013 at 17:02

answered Apr 21, 2013 at 16:54

OldCurmudgeon

65.4k16 gold badges123 silver badges218 bronze badges

Add a comment |

Peter Lawrey · Accepted Answer · 2013-04-21 21:05:40Z

0

A simple solution is to either use fixed precision. i.e. an integer 10x or 100x what you want.

float f = 10;
f += 0.1f;

becomes

int i = 100;
i += 1;  // use an many times as you like
// use i / 10.0 as required.

I wouldn't use float in any case as you get more rounding errors than double for next to no benefit (unless you have millions of float values) double gives you 8 more digits of precision and with sensible rounding would won't see those errors.

answered Apr 21, 2013 at 21:05

Peter Lawrey

531k82 gold badges764 silver badges1.1k bronze badges

Add a comment |

Thorsten S. · Accepted Answer · 2013-04-21 20:04:05Z

-1

If you stick with floats: The easiest way to avoid the error is using floats which are exact, but near the desired value which is

round(2^n * value) * 1/2^n.

n is the number of bits, value the number to use (in your case 0.1)

In your case with increasing precision:

n = 4 => 0.125
n = 8 (byte) => 0.9765625
n = 16 (short)=> 0.100006103516....

The long number chains are artefacts of the binary conversion, the real number has much less bits.

As the floats are exact, addition and subtraction will not introduce offset errors, but will always be predictable as long as the number of bits is not longer than the float value holds.

If you fear that your display will be compromised by using this solution (because they are odd floats), use and store only integers (step increase -1/1). The final value which is internally set is

x = value * step.

As the step increases or decreases by an amount of 1, precision will be retained.

edited Apr 21, 2013 at 20:04

answered Apr 21, 2013 at 19:58

Thorsten S.

4,23428 silver badges42 bronze badges

2

What is your intent here? Although this eliminates errors during addition and subtraction, within certain bounds, it increases the total error because the step size used (.100006…) is different from the desired step size. That makes the calculated results different from the desired results. How is it that you expect to obtain the desired results?
– Eric Postpischil
Commented Apr 21, 2013 at 21:07
Because it does normally not matter if the target of the computation does not care if it is 0.1 or 0.100... In this specific case a 3D model is increased or decreased by a specific amount. I use this trick e.g. in integration intervals because adding is much faster and more precise if the start position and the increase interval is an exact float. I would in fact advise to use 0.125 because that means with 8 steps I have again an integer value and it is not unnatural to divide 1.0 in eight steps.
– Thorsten S.
Commented Apr 21, 2013 at 21:30
This answer fails to explain itself. The intent is apparently to change the step size, rather than fixing the errors that occur with the originally proposed step size, but it is not clear from the problem statement that a changed step size is acceptable. Clearly, this changes the behavior of the software: With a changed step size, there are no longer exactly ten steps from 3 to 4. Perhaps this would be acceptable to the asker, perhaps not. But the answer certainly ought to introduce the proposal and explain that it is changing the observed behavior and address the consequences of this.
– Eric Postpischil
Commented Apr 22, 2013 at 11:00

Add a comment |

Collectives™ on Stack Overflow

How to actually avoid floating point errors when you need to use float?

4 Answers 4

Not the answer you're looking for? Browse other questions tagged
java
floating-point
precision
floating-accuracy
floating-point-precision
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Not the answer you're looking for? Browse other questions tagged javafloating-pointprecisionfloating-accuracyfloating-point-precision or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
java
floating-point
precision
floating-accuracy
floating-point-precision
or ask your own question.