Skip to main content

Questions tagged [floating-point]

Mathematical questions concerning floating point numbers, a finite approximation of the real numbers used in computing.

4 votes
2 answers
107 views

pow and its relative error

Investigating the floating-point implementation of the $\operatorname{pow}(x,b)=x^b$ with $x,b\in\Bbb R$ in some library implementations, I found that some pow ...
emacs drives me nuts's user avatar
6 votes
0 answers
142 views

Algebraic Structures involving 𝙽𝚊𝙽 (absorbing element).

IEEE 754 floating point numbers contain the concept of 𝙽𝚊𝙽 (not a number), which "dominates" arithmetical operations ($+,-,⋅,÷$ will return ...
Hyperplane's user avatar
  • 11.8k
3 votes
0 answers
53 views

Solve $10^{10^z} = 10^{10^x}+10^{10^y}$ for $z$ with floating point accuracy

In the following equation $$10^{10^z} = 10^{10^x}+10^{10^y}$$ I want to find an algorithm that computes $z$ in a floating point accurate manner given any values of $x$ and $y$ (e.g. $x=y=2000$). The ...
Gerben Beintema's user avatar
1 vote
2 answers
63 views

How to transform this expression to a numerically stable form?

I have this function $$f(x, t)=\frac{\left(1+x\right)^{1-t}-1}{1-t}$$ Where $x \ge 0$ and $t \ge 0$. I want to use it in neural network, and thus need it to be differentiable. While it has a ...
yuri kilochek's user avatar
1 vote
0 answers
49 views

Proof that $\epsilon_{mach} \leq \frac{1}{2} b^{1-n}$

I have a question about the proof of the following statement: For each set of machine numbers $F(b, n, E_{min}, E_{max})$ with $E_{min} < E_{max}$ the following inequality holds: $\epsilon_{mach} \...
Felix Gervasi's user avatar
1 vote
0 answers
56 views

Why does TI-84 show scientific notation for zeros sometimes but not others?

When graphing a function and then going through the process to calculate the zeroes (left bound, right bound, guess), is there a reason that sometimes it shows y = 0, but there are other times when it ...
mmmmmm's user avatar
  • 141
2 votes
1 answer
73 views

Numerically stable way to compute ugly double fraction

I am looking for a numerically stable version of this (ugly) equation $$ s^2=\frac{1}{\frac{1}{\beta_1}+\frac{1}{\beta_2}W} $$ where $$ \beta_1 = c_1-c_2m+(m-c_2)b\\ \beta_2 = \frac{1}{2}\left((a-m)^2-...
mto_19's user avatar
  • 272
1 vote
0 answers
82 views

Fundamental Axiom of Floating Point Arithmetic for Complex Numbers Multiplication

I am trying to prove the fundamental axiom of floating point arithmetic also applies to complex number multiplication. First, let $fl$ be a function that maps a number to its closest floating point ...
Gu Bochao's user avatar
1 vote
1 answer
137 views

How do calculators represent floating points (somewhat) perfectly?

If you ask a programming language to calculate 0.6 + 0.7, you’ll get something like 1.2999998, and that’s because of how floating point numbers are represented in computers. But if you ask a ...
Samathan's user avatar
0 votes
0 answers
29 views

Calculating coordinates of vertices, given dimensions in an architectural floorplan

So, one of my friend is trying to learn autocad. They were given a floorplan. The floorplan had the dimensions. And they were asked to find the coordinates of the all the vertices of the plan. So we ...
user3851878's user avatar
3 votes
2 answers
86 views

Proof that $\frac 1{10}$ has no finite binary float representation

I am supposed to prove that $\frac 1{10}$ is not representable as a finite binary float. I tried proving this via induction but that did not seem to work, now I am out of ideas. Thank you
trapaholicsmixtapes's user avatar
1 vote
2 answers
62 views

Absolute difference between largest IEEE754 number and its predecesor

In simple precision format, the largest possible positive number is $A = 0 ~~~ 11111110 ~~~ 111\ldots 111$ Its predecessor is $B = 0 ~~~ 11111110 ~~~ 111 \ldots 110$ But what is the absolute ...
lafinur's user avatar
  • 3,408
0 votes
1 answer
56 views

Proof of `TWOSUM` implementation in "double-double" arithmetic

"double-double" / "compensated" arithmetic uses unevaluated sums of floating point numbers to obtain higher precision. One of the basic algorithms is ...
Claude's user avatar
  • 5,707
0 votes
0 answers
10 views

Specify the conditions Exponent and Mantissa sizes must meet, so that the minimal distance between representable numbers is no more than 1.

Using the following floating-point representation: s - one sign bit m - mantissa - real number in range [1, 2), in which 1 and the comma are skipped, size of M bits c - Exponent - natural number, ...
Artur Bieniek's user avatar
0 votes
0 answers
50 views

What is the computational complexity of calculating determinants for matrices of finite precision floating-point numbers?

Following up from this older question, I understand that calculation of determinants for integer-valued matrices is possible with polynomial scaling. However, I have been unable to locate any ...
KarimAED's user avatar

15 30 50 per page
1
2 3 4 5
32