6
$\begingroup$

Each technology, that is evolving, begins with simple piece of programming code that starts with variables of certain data types. It is interesting to note that many calculative and computing technologies including system clocks have "float" (and similar other) data types which can store floating point numbers. There are many evidences already which depicts how human errors in judgement and prediction have created floating point errors and ended up in a chaotic disaster.

Just to illustrate a few of them to support my question:

February 25,1991: 28 American soldiers killed due to floating point error!!

The time in tenths of second as measured by the system's internal clock was multiplied by 1/10 to produce the time in seconds. This calculation was performed using a 24 bit fixed point register. In particular, the value 1/10, which has a non-terminating binary expansion, was chopped at 24 bits after the radix point. The small chopping error, when multiplied by the large number giving the time in tenths of a second, lead to a significant error. (Source)

June 4, 1996: Blast in an unmanned spaceship during launch

The rocket was on its first voyage, after a decade of development costing dollar 7 billion. It turned out that the cause of the failure was a software error in the inertial reference system. Specifically a 64 bit floating point number relating to the horizontal velocity of the rocket with respect to the platform was converted to a 16 bit signed integer. The number was larger than 32,768, the largest integer storeable in a 16 bit signed integer, and thus the conversion failed. (Source)

There are lots of examples out there where, 10th or even 100th floating point multiplied small chain of errors to huge disasters.

While in future, when space and time travel is considered a definite possibility, How should such unpredictable errors be handled? How would programming and development of such long travels be done specifically with floating point values?

Bonus: how would the cause of any such failure would be recognized or investigated if it occurred during space and time travel?

Edit: Don't Consider Unpredictable literally. It was just a hyperbole annotation used. I simply just meant "Hard to predict"

$\endgroup$
7
  • $\begingroup$ If this is just programming issue can't you declare either a float or int or what have you appropriately? $\endgroup$
    – user6760
    Commented Dec 25, 2016 at 17:08
  • $\begingroup$ But they are not unpredictable errors. Proper programs use float properly. If the precision required is beyond the capabilities of float then simple don't use the outcome. For the most part those are not examples of float precision problems. And the data is stored in memory - not the system clock. And I don't know any date / time that uses float for the underlying store. $\endgroup$
    – paparazzo
    Commented Dec 26, 2016 at 19:49
  • $\begingroup$ Of course data is stored in memory and not in clock. What I meant was clock also uses floating point values where precision of even 1/10th second is required. Also, I dont know why you don't think the examples are irrelevant. Those two examples are Exactly the floating precision problems. $\endgroup$ Commented Dec 27, 2016 at 4:46
  • $\begingroup$ @KaranDesai They're not exactly floating precision problems. They sound more like "conversion carelessness". In the first case: Someone used a weak algorithm to calculate something not taking into account the required precision (wouldn't be surprised if it wasn't even documented). Second case: Someone allowed a 64-bit float to be converted to a 16-bit integer and saw no problem with it. Also, "conversion failed" -> not tested for the simplest edge cases. $\endgroup$
    – xDaizu
    Commented Dec 27, 2016 at 14:25
  • 1
    $\begingroup$ Just because time is to 1/10 of a second does not mean the backing data was float. A floating point precision problem does not mean float was the cause. Those are examples where float used properly would work just fine. $\endgroup$
    – paparazzo
    Commented Dec 28, 2016 at 16:24

6 Answers 6

1
$\begingroup$

This is a topic for whole books to fill, here is a quick overview.

Errors will always be there. Not only because of floating point, but because humans make mistakes, circuits can malfunction due to temperature, voltage, radiation, gravitation (especially if it is strong enough to measurably influence time)… However floating point introduces a lot of quirky effects into your mathematics, which the programmer has to account for, making it a lot harder to write a "proper" program using floating point.

Software:

The best way so far is to invest a huge amount of time money and training in reviewing code, writing tests, and running simulations.

There is the philosophy of defensive programming which means that the programmer constantly writes all made assertions into his code, checks the boundaries of values before using them for calculations and subdivides the program in as small pieces as possible which each does exactly one task and can be by itself considered to either complete this task or catch anything fishy before something goes wrong and either correct the error reliably (which rarely is possible) or propagate the exact nature of the problem to the next upper level in the programs hierarchical structure. The drawback is a hugely increased cost/ manpower requirement and since this way of programming is extremely tedious, productivity will be lower than the code-and-forget approach usually taken. A variation is paranoid programming which takes it to such extreme measures, that it becomes nearly impossible to write something more complex than a simple calculator.

If you can formulate your problem mathematically, it is possible to formally prove it. This can be partially automatized, however it still takes A LOT of time, computing power and human effort.

Within a well-written defensive program error recognition is quite straight forward since they are propagated to the user if no error-handling strategy matches the error discovered.

Hardware:

Hardware errors can be discovered best by redundancy. Have multiple copies of the same hardware do the same stuff at the same time and check if they diverge. Modern memory also uses check sums of the stored data to catch corruptions.

Error discovery and -handling

Discovering the error sources and fixing these errors depends on the available tooling. Is there debugging software available? (Note that the debugger itself may have bugs!) Do you even know which value was expected? Can you reproduce the error? If not you are in for a bad time - some errors occur only a fixed number of times, or only manifest themselves a long time after the event that caused them. Debugging integrated circuits is terribly difficult, requires unusually complex equipment and can nearly never be fixed. In each scenario you need time, well-trained personal and good equipment.

A good way to go would be using simple, modular and easily replaceable hardware in combination with high-quality and well-tested software and keep an educated, experienced, motivated and knowledgeable staff around. Have a contingency plan for as many scenarios as possible, a lot of backup hardware and duplicated data and keep printed exhaustive manuals around for when everything else fails.

$\endgroup$
1
  • $\begingroup$ Indeed, good explanation. +1 for mention of paranoid and defensive programming. $\endgroup$ Commented Dec 29, 2016 at 4:54
21
$\begingroup$

Rounding errors are not unpredictable. This is fundamental error in your question. These may be unpredicted, but that's only a lack of skill.

Errors you mentioned are not inherently related to floating point numbers. First one is caused by improper use of fixed point. Second is just a bad conversion. Both errors are common and should be caught by code reviews before flight, and in simulated flights.

With this kind of error, the idea is to have the same algorithm coded independently by two teams. If results are not in acceptable proximity, you know one is buggy.

For usual rounding errors, solution is simple. Code to the limits of your hardware. If you can't step dV more precisely than 0.001 m/s, no need to calculate and store with significantly greater precision. Instead, just measure again some time later, and perform correction burns when you have to. In other words, know you can't know and design procedures accordingly.

$\endgroup$
4
  • 1
    $\begingroup$ 'Rounding errors are not unpredictable. This is fundamental error in your question. These may be unpredicted, but that's only a lack of skill.' this. I'll also add that sometimes is not lack of skill as much as not over-engineering, which is something good enough when carelessly converting strings to numbers in my personal website; but bad with military equipment. $\endgroup$
    – xDaizu
    Commented Dec 27, 2016 at 14:28
  • $\begingroup$ @xDaizu if it is good enough, there will be no explosions ;) $\endgroup$
    – Mołot
    Commented Dec 27, 2016 at 14:36
  • 4
    $\begingroup$ Short version: if your task is detailed enough to be affected by floating point round offs, hire programmers who know how to avoid floating point round off errors. $\endgroup$
    – Cort Ammon
    Commented Dec 27, 2016 at 16:52
  • $\begingroup$ And, if you need more precision, use a longer data type (a 128-bit floating-point, instead of 64-bit), or BigNumber libraries - in this case, the tradeoff is slower processing, since BigNumbers are slower than native number types. See en.wikipedia.org/wiki/Arbitrary-precision_arithmetic . $\endgroup$ Commented Dec 28, 2016 at 21:35
1
$\begingroup$

"While in future, when space and time travel is considered a definite possibility, How should such unpredictable errors be handled? How would programming and development of such long travels be done specifically with floating point values?"

  1. With huge amount of computational power I could imagine debuggers that would be able to randomly generate plenty of different weird scenarios and see how would software behave.

  2. More computational power - using terribly inefficient, but quite robust very high level programming languages. (Like JAVA does not allow you to use a pointer out of range of array and makes such error at least visible)

  3. Mass production, mature technology - the software for a new space ship may be of the shelf software, that was run on hundreds previous ships and even if it was debugged in the hard way, then it at least at that moment works well.

  4. With big enough data size... Well, some problems would be solved anyway... No more Y2K problem... 64bit time would serve us well for many time s more time than contemporary age of universe. Part of our problem is to have to optimize the code.

"Bonus: how would the cause of any such failure would be recognized or investigated if it occurred during space and time travel?"

Just back up the software before launch. Just in case run some test afterwards.

$\endgroup$
1
$\begingroup$

As others point out, there are arbitrary-precision software packages that wouldn't suffer any numerical inaccuracy. That is what you'd use in a competently designed system.

But, if you insist on using floating point numbers, there is one significant effect that would occur- it would be possible to specify nearby locations more accurately than far locations. This normally doesn't matter because our earth is relatively small, numerically speaking. But in the realm of space the distances become very large indeed.

As a loose explanation, floating point numbers have a limited number of precise digits due to the way they are represented in a machine. In one common format (the 32-bit, IEEE double precision floating point number), you have about 7 significant digits of precision. Regardless of whether you're representing a large or small number, you only have 7 digits to do it in. For example, the numbers 1.234567 and 123456.7 both have the same numerical precision (seven digits after the most significant).

Suppose you are counting kilometers directly. Nearby locations can be specified with great accuracy:

"Move 0.001 KM away from the planet" would be equivalent to moving one meter, or

"Move 0.000001 KM toward the ship" would be equivalent to moving one millimeter

However, far locations can only be specified very coarsely. If you say:

"Move 1000000 KM along this vector" then the last precise digit is the 1KM place. You can't say "Move 1000000.001 KM", as your maximum precision at this point is now 1KM. Likewise, if you say "Move 1,000,000,000 KM in that direction" your maximum precision is now 1,000KM. You can specify the following two locations accurately, but nothing in between:

1,000,000,000 KM and 1,000,001,000 KM

Astronomic distances are very large. In order to "jump to" a star system you need to get within a few hundred thousand KM. Our closest nearby galaxy (Andromeda) is 2.4*10^19 KM away, with 7 digits of precision you can only discriminate plus or minus 10^12 KM (a trillion KM). With a double precision, 64-bit IEEE format you've got about 16 digits of precision, so you can specify locations plus or minus 10^3 KM (a thousand KM) in Andromeda. Again, that happens to be the closest galaxy.

Consider what happens when precision breaks down. If you're plus or minus 100,000 KM then, to avoid jumping into a planet, you need your jump target at least 100,000 KM away from the planet surface, which might be as much as 200,000 KM if you round the other way. If it's critical to be close to the planet when you jump, you need your origin point to be closer to the destination. Hence you would have to jump somewhere near the solar system, and then you would jump to the planet.

You can come up for your own explanation of why that's good or bad for economics or military engagements or what have you.

$\endgroup$
1
  • $\begingroup$ Good explanation $\endgroup$ Commented Dec 29, 2016 at 4:50
-1
$\begingroup$

I'd be very interested in learning about the accidents which occurred because 100 bit numbers were inadequate for the task? Please cite your source for this (dubious, imho) claim. You imply that engineers use "formulas" for calculations without having a clue about what their limits are. Well, I'm sure some people aren't motivated enough or educated enough to bother finding out whether the "black box" can be trusted. Hopefully, those aren't the engineers making the decisions for any project where human life is at risk. Your question assumes a level of incompetence which is not typical. Oh, I should also add that the calculus of determining errors propagating through numerical calculations is a mature discipline.

$\endgroup$
1
  • $\begingroup$ I think that this does not actually answers my question. But as you insisted on source, you can simply google "Accidents due to numerical errors or floating point errors". One such source is: en.wikipedia.org/wiki/Floating_point#Incidents $\endgroup$ Commented Dec 29, 2016 at 4:51
-4
$\begingroup$

Program in LISP, Haskell, or other functional, demand-driven languages that support infinite-decimal-point numbers. The C/assembly programming paradigm was useful for original machines because the computing resources were so limited that the languages needed to match the hardware pretty directly. Now we need a better programming paradigm that supports hygienic programming (that's an actual technical term), precise computation, and discoverable parallelism.

$\endgroup$
8
  • $\begingroup$ Support for arbitrary precision floating point has absolutely nothing to do with programming paradigms. I am forced to downvote the answer, as even if it "goes the right way" it is ultimately wrong. $\endgroup$
    – Borsunho
    Commented Dec 25, 2016 at 16:34
  • 5
    $\begingroup$ en.wikipedia.org/wiki/… Never seen? Were you even looking? Took me less than a minute with Google. $\endgroup$
    – Mołot
    Commented Dec 25, 2016 at 18:00
  • 1
    $\begingroup$ @SRM almost any language has libraries for that, including oldies like C, and for most newer it's part of the standard... you can google this yourself. $\endgroup$
    – Borsunho
    Commented Dec 25, 2016 at 18:01
  • 4
    $\begingroup$ If you think today's (or any non-handwavium future) computing resources are sufficient to support horribly inefficient programing models, then you've only worked on trivial problems. And that's not even counting the loss of time (and potential errors) due to the fact that Lisp &c are absolutely incomprehensibe to most programmers. $\endgroup$
    – jamesqf
    Commented Dec 25, 2016 at 19:09
  • 2
    $\begingroup$ This is a debate to move to chat if we want to continue it. :-) $\endgroup$
    – SRM
    Commented Dec 25, 2016 at 21:14

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .