Skip to main content

Questions tagged [floating-point]

Mathematical questions concerning floating point numbers, a finite approximation of the real numbers used in computing.

144 questions with no upvoted or accepted answers
9 votes
0 answers
226 views

Theory of floating point math

We learn about groups, rings and fields in algebra - but floating point numbers (like double in many modern programming languages) do not form one of the above ...
J Fabian Meier's user avatar
7 votes
0 answers
213 views

Topological nature of IEEE floating-point numbers

If IEEE floating-point numbers had countably infinite precisions, its domain would be: $$ \{-\infty\}\cup\mathbb{R}^-\cup\{-0,+0\}\cup\mathbb{R}^+\cup\{+\infty\}\cup\{\text{NaN}\} $$ Let's denote ...
Dannyu NDos's user avatar
  • 2,049
6 votes
0 answers
143 views

Algebraic Structures involving 𝙽𝚊𝙽 (absorbing element).

IEEE 754 floating point numbers contain the concept of 𝙽𝚊𝙽 (not a number), which "dominates" arithmetical operations ($+,-,⋅,÷$ will return ...
Hyperplane's user avatar
  • 11.8k
5 votes
1 answer
252 views

Associativity in floating point arithmetic failing by two values

Assume all numbers and operations below are in floating-point arithmetic with finite precision, bounded exponent, and rounding to the nearest integer. Are there $x,y$ positive such that $$\begin{...
EEE's user avatar
  • 111
4 votes
0 answers
143 views

mean of two floating point numbers: why $a+\frac{b-a}{2}$ is better than $\frac{a+b}{2}$?

$a$ and $b$ are the floating point representation of two real numbers with no constraints (they can be both negative or both positive or one positive and the other negative and so on). I read in the ...
Alessandro Jacopson's user avatar
4 votes
0 answers
251 views

Can trigonometric functions for double precision be implemented in terms of those for single precision?

In some program environments like GLSL there is support for single and double precision numbers for arithmetic and square roots computation, but only single precision trigonometric functions are ...
Ruslan's user avatar
  • 6,875
4 votes
0 answers
62 views

How quickly can one compare exp(m/n) to a given rational?

For positive integers $\hspace{.06 in}m_{\hspace{.02 in}0}\hspace{.02 in},n_0\hspace{.02 in},m_1,n_1\:$, $\;$ how difficult is it to decide whether $$\exp\left(\hspace{-0.03 in}\frac{m_{\hspace{.02 in}...
user avatar
3 votes
0 answers
53 views

Solve $10^{10^z} = 10^{10^x}+10^{10^y}$ for $z$ with floating point accuracy

In the following equation $$10^{10^z} = 10^{10^x}+10^{10^y}$$ I want to find an algorithm that computes $z$ in a floating point accurate manner given any values of $x$ and $y$ (e.g. $x=y=2000$). The ...
Gerben Beintema's user avatar
3 votes
0 answers
152 views

Justification for the definition of relative error, why is it not a metric?

The absolute error and relative error operators are very commonly encountered while reading about topics from the fields of floating-point arithmetics or approximation theory. Absolute error is ${ae(a,...
user2373145's user avatar
3 votes
0 answers
132 views

Limit of Euler's number failing due to precision errors - A surprising case. Why does it happen?

It is a known fact that floating point precision errors are bound to happen when one forces a computer to deal with very large or very small numbers, especially when both things are done at the same ...
Danilo Guimarães's user avatar
3 votes
0 answers
223 views

Relative error when chopping is used of a 32 bit floating point number in IEEE 754 format

Find the binary representation of 85.125 using IEEE 754 standard 32-bit floating point number presentation. Find the relative error if chopping is used. I calculated the binary representation of 85....
Taylor's user avatar
  • 57
3 votes
0 answers
293 views

Error bound for floating-point interval dot product

In Handbook of Floating-Point Arithmetic (Birkhäuser, 2010, Chapter 6) Muller et al. presented the following absolute forward error bound for the floating-point recursive dot product: $$ \left|...
Konstantin Isupov's user avatar
3 votes
1 answer
356 views

Dyadic rational boundary points of the Mandelbrot set

The rationale behind this question is computer rendering of the Mandelbrot set using binary floating point (a subset of the dyadic rationals): interior and exterior points are relatively easy to ...
Claude's user avatar
  • 5,707
3 votes
1 answer
849 views

Quadratic Root Equation Error

Suppose a machine with the floating-point system $\beta = 10$, $p = 8$, $L = -50$, and $U = 50$ is used to calculate the roots of a quadratic equation $ax^2 + bx + c = 0$, where $a$, $b$, and $c$ are ...
user2759722's user avatar
3 votes
0 answers
363 views

Check for Ill Conditioned matrix

How can I efficiently check if a tridiagonal system with 1's in diagonal is ill-conditioned or not ? The common way is to get the ratio of largest and smallest singular values and see if its greater ...
Anonym's user avatar
  • 109

15 30 50 per page
1
2 3 4 5
10