Questions tagged [floating-point]
Mathematical questions concerning floating point numbers, a finite approximation of the real numbers used in computing.
144
questions with no upvoted or accepted answers
9
votes
0
answers
226
views
Theory of floating point math
We learn about groups, rings and fields in algebra - but floating point numbers (like double in many modern programming languages) do not form one of the above ...
7
votes
0
answers
213
views
Topological nature of IEEE floating-point numbers
If IEEE floating-point numbers had countably infinite precisions, its domain would be:
$$
\{-\infty\}\cup\mathbb{R}^-\cup\{-0,+0\}\cup\mathbb{R}^+\cup\{+\infty\}\cup\{\text{NaN}\}
$$
Let's denote ...
6
votes
0
answers
143
views
Algebraic Structures involving 𝙽𝚊𝙽 (absorbing element).
IEEE 754 floating point numbers contain the concept of 𝙽𝚊𝙽 (not a number), which "dominates" arithmetical operations ($+,-,⋅,÷$ will return ...
5
votes
1
answer
252
views
Associativity in floating point arithmetic failing by two values
Assume all numbers and operations below are in floating-point arithmetic with finite precision, bounded exponent, and rounding to the nearest integer.
Are there $x,y$ positive such that $$\begin{...
4
votes
0
answers
143
views
mean of two floating point numbers: why $a+\frac{b-a}{2}$ is better than $\frac{a+b}{2}$?
$a$ and $b$ are the floating point representation of two real numbers with no constraints (they can be both negative or both positive or one positive and the other negative and so on).
I read in the ...
4
votes
0
answers
251
views
Can trigonometric functions for double precision be implemented in terms of those for single precision?
In some program environments like GLSL there is support for single and double precision numbers for arithmetic and square roots computation, but only single precision trigonometric functions are ...
4
votes
0
answers
62
views
How quickly can one compare exp(m/n) to a given rational?
For positive integers $\hspace{.06 in}m_{\hspace{.02 in}0}\hspace{.02 in},n_0\hspace{.02 in},m_1,n_1\:$, $\;$ how difficult is it to decide whether $$\exp\left(\hspace{-0.03 in}\frac{m_{\hspace{.02 in}...
3
votes
0
answers
53
views
Solve $10^{10^z} = 10^{10^x}+10^{10^y}$ for $z$ with floating point accuracy
In the following equation
$$10^{10^z} = 10^{10^x}+10^{10^y}$$
I want to find an algorithm that computes $z$ in a floating point accurate manner given any values of $x$ and $y$ (e.g. $x=y=2000$). The ...
3
votes
0
answers
152
views
Justification for the definition of relative error, why is it not a metric?
The absolute error and relative error operators are very commonly encountered while reading about topics from the fields of floating-point arithmetics or approximation theory.
Absolute error is
${ae(a,...
3
votes
0
answers
132
views
Limit of Euler's number failing due to precision errors - A surprising case. Why does it happen?
It is a known fact that floating point precision errors are bound to happen when one forces a computer to deal with very large or very small numbers, especially when both things are done at the same ...
3
votes
0
answers
223
views
Relative error when chopping is used of a 32 bit floating point number in IEEE 754 format
Find the binary representation of 85.125 using IEEE 754 standard 32-bit floating point
number presentation. Find the relative error if chopping is used.
I calculated the binary representation of 85....
3
votes
0
answers
293
views
Error bound for floating-point interval dot product
In Handbook of Floating-Point Arithmetic (Birkhäuser, 2010, Chapter 6) Muller et al. presented the following absolute forward error bound for the floating-point recursive dot product:
$$
\left|...
3
votes
1
answer
356
views
Dyadic rational boundary points of the Mandelbrot set
The rationale behind this question is computer rendering of the Mandelbrot set using binary floating point (a subset of the dyadic rationals): interior and exterior points are relatively easy to ...
3
votes
1
answer
849
views
Quadratic Root Equation Error
Suppose a machine with the floating-point system $\beta = 10$, $p = 8$, $L = -50$, and $U = 50$ is used to calculate the roots of a quadratic equation $ax^2 + bx + c = 0$, where $a$, $b$, and $c$ are ...
3
votes
0
answers
363
views
Check for Ill Conditioned matrix
How can I efficiently check if a tridiagonal system with 1's in diagonal is ill-conditioned or not ? The common way is to get the ratio of largest and smallest singular values and see if its greater ...