Questions tagged [floating-point]

Mathematical questions concerning floating point numbers, a finite approximation of the real numbers used in computing.

pow and its relative error

Investigating the floating-point implementation of the $\operatorname{pow}(x,b)=x^b$ with $x,b\in\Bbb R$ in some library implementations, I found that some pow ...
Algebraic Structures involving π™½πšŠπ™½ (absorbing element).

IEEE 754 floating point numbers contain the concept of π™½πšŠπ™½ (not a number), which "dominates" arithmetical operations ($+,-,β‹…,Γ·$ will return ...
Solve $10^{10^z} = 10^{10^x}+10^{10^y}$ for $z$ with floating point accuracy

In the following equation $$10^{10^z} = 10^{10^x}+10^{10^y}$$ I want to find an algorithm that computes $z$ in a floating point accurate manner given any values of $x$ and $y$ (e.g. $x=y=2000$). The ...
How to transform this expression to a numerically stable form?

I have this function $$f(x, t)=\frac{\left(1+x\right)^{1-t}-1}{1-t}$$ Where $x \ge 0$ and $t \ge 0$. I want to use it in neural network, and thus need it to be differentiable. While it has a ...
Proof that $\epsilon_{mach} \leq \frac{1}{2} b^{1-n}$

I have a question about the proof of the following statement: For each set of machine numbers $F(b, n, E_{min}, E_{max})$ with $E_{min} < E_{max}$ the following inequality holds: $\epsilon_{mach} \...
Why does TI-84 show scientific notation for zeros sometimes but not others?

When graphing a function and then going through the process to calculate the zeroes (left bound, right bound, guess), is there a reason that sometimes it shows y = 0, but there are other times when it ...
Numerically stable way to compute ugly double fraction

I am looking for a numerically stable version of this (ugly) equation $$ s^2=\frac{1}{\frac{1}{\beta_1}+\frac{1}{\beta_2}W} $$ where $$ \beta_1 = c_1-c_2m+(m-c_2)b\\ \beta_2 = \frac{1}{2}\left((a-m)^2-...
Fundamental Axiom of Floating Point Arithmetic for Complex Numbers Multiplication

I am trying to prove the fundamental axiom of floating point arithmetic also applies to complex number multiplication. First, let $fl$ be a function that maps a number to its closest floating point ...
How do calculators represent floating points (somewhat) perfectly?

If you ask a programming language to calculate 0.6 + 0.7, you’ll get something like 1.2999998, and that’s because of how floating point numbers are represented in computers. But if you ask a ...
Calculating coordinates of vertices, given dimensions in an architectural floorplan

So, one of my friend is trying to learn autocad. They were given a floorplan. The floorplan had the dimensions. And they were asked to find the coordinates of the all the vertices of the plan. So we ...
Proof that $\frac 1{10}$ has no finite binary float representation

I am supposed to prove that $\frac 1{10}$ is not representable as a finite binary float. I tried proving this via induction but that did not seem to work, now I am out of ideas. Thank you
Absolute difference between largest IEEE754 number and its predecesor

In simple precision format, the largest possible positive number is $A = 0 ~~~ 11111110 ~~~ 111\ldots 111$ Its predecessor is $B = 0 ~~~ 11111110 ~~~ 111 \ldots 110$ But what is the absolute ...
Proof of `TWOSUM` implementation in "double-double" arithmetic

"double-double" / "compensated" arithmetic uses unevaluated sums of floating point numbers to obtain higher precision. One of the basic algorithms is ...
Specify the conditions Exponent and Mantissa sizes must meet, so that the minimal distance between representable numbers is no more than 1.

Using the following floating-point representation: s - one sign bit m - mantissa - real number in range [1, 2), in which 1 and the comma are skipped, size of M bits c - Exponent - natural number, ...
What is the computational complexity of calculating determinants for matrices of finite precision floating-point numbers?

Following up from this older question, I understand that calculation of determinants for integer-valued matrices is possible with polynomial scaling. However, I have been unable to locate any ...
How to compute the successor to a given floating point number

Let $F$ the set of all floating point number $n2^e$ such that $ -2^{53} < n < 2^{53}$ and $βˆ’1074 \leq e \leq 970$. Let $F^* = F - \{\max(F)\}$ I assume $F$ not to be dense, and therefore there ...
Is there still a fast invsqrt magic number for float128?

Show that $x+1$ is not backward stable

Suppose we use $\oplus$ to compute $x+1$, given $x \in \mathbb{C}$. $\widetilde{f(x)} = \mathop{\text{fl}}(x) \oplus 1$. This algorithm is stable but not backward stable. The reason is that for $x \...
Another way to compute the epsilon machine

Why the next program computes the machine precision? I mean, it can be proved that the variable $u$ will give us the epsilon machine. But I don't know the reason of this. Let $a = \frac{4}{3}$ $b = a βˆ’...
Justification for the definition of relative error, why is it not a metric?

The absolute error and relative error operators are very commonly encountered while reading about topics from the fields of floating-point arithmetics or approximation theory. Absolute error is ${ae(a,...
Tricks in the floating point operations for better numerical results

I'm attempting to comprehend a passage from the book "Computational Modeling and Visualization of Physical Systems with Python" which I may be mentally fatigued to grasp. Here's the issue: ...
Is there a stable algorithm for every well-conditioned problem?

Reading these notes on condition numbers and stability, the summary states: If the problem is well-conditioned then there is a stable way to solve it. If the problem is ill-conditioned then there is ...
Floating Point Precision Algorithm

In my database, data stored as a precision of 10 digits Decimal(30,10). User can enter x or 1/x. I need to save in 1/x. If user enters ...
Secant method optimization - initial guesses with floating point precision?

Say I want to find the root of $f(x) = e^{-x} - 5$, and assume I start with initial guesses $x_0 = -3$ and $x_1 = 3$. I define my update function as $x_i = x_{i-1} - f(x_{i-1}) * \frac{x_{i-1} - x_{i-...
Does using smaller floating-point numbers decrease rounding errors?

I started learning about floating point by reading "What Every Computer Scientist Should know About Floating-Point Arithmetic" by David Goldberg. On page 4 he presents a proof for the ...
How to calculate converted value for each number in a set using a conversion rate, having its sum equal exactly a rounded fixed converted total?

Say I have three numeric values: a total, converted total, and a conversion rate. These are fixed, given numbers, and the two totals always have the precision of two decimal places. ...
Finding an expression for $\sqrt{x^2 + z^2}$ that is more precise in floating point arithmetic?

Assuming that both $x$ and $z$ have no representation errors, and that $\vert z^2 \vert \ll \vert x^2 \vert$. There must exist an expression for $\sqrt{x^2 + z^2}$ that is the same in exact arithmetic ...
How does a computer calculate matrix scalar multiplication order of operations (flops)

I am trying to understand the number of flops in the Householder QR factorization. In one line of the algorithm, it says \begin{gather*} v = v / \lVert v \rVert_2 \end{gather*} I was wondering what ...
On the axioms of floating-point arithmetic

As I understand there are two "axioms" that should be satisfied in floating-point arithmetic: $$\forall x\in \mathbb R,\ \exists |\varepsilon|\leq\varepsilon_{\text{machine}},\ \mbox{fl} (x) ...
Representation of rounding error in floating point arithmetic. [duplicate]

It is well known that in a Floating point number system: $$ \mathbb{F}:=\{\pm \beta^{e}(\frac{d_1}{\beta}+\dots +\frac{d_t}{\beta^t}): d_i \in \{0,\dots,\beta-1\},d_1\neq 0, e_{\min}\leq e \leq e_{\...
Expression of sum in floating point system

This is a question of an exam on Numerical Analysis I had: Consider the floating point system of base $2$, maximum number of decimals $53$, maximum exponent $1025$ and minimum exponent $-1022$. That ...
Evaluating $a(b + c)$ more accurately with FMA

I'm using machine-precision floating-point arithmetic, and every so often it happens that I need to evaluate an expression of the form $a(b + c)$. I found that the accuracy can be improved using FMA (...
Numerically stable evaluation of factored univariate real polynomial

Suppose we have a real univariate factored polynomial, meaning we have its factors: an arbitrary number of polynomials of degree less than or equal to two. To simplify things, if necessary, let's ...
Bias in Single Precision Floating numbers

I had a doubt regarding Single Precision Floating point numbers. It is about the bias number which can be derived from exponent part of this representation of numbers. On searching up on google, most ...
How to compute this "smooth max operator"?

I was seeking for an alternate way to activate each neuron of a neural network non-linearly. Eventually, I came up with the following binary operation: $$ x \lor y = \log (\exp x + \exp y) $$ With $-\...
How to Multiply 2 arrays with unique non-integers to prodice an array with unique results?

Is there an Algortihm/formulae to multiply two arrays (1D & 2D) of unique numbers such that the resultant array contains unique results. Would one have to create the 2 initial arrays in a certain ...
What is the set of all numbers that can be represented with a floating-point format?

Computers use single- (or, for more precise calculations, double-) precision floating-point formats to represent a subset of real numbers. While a decent chunk of real numbers can be stored with these ...
Method for finding the largest positive difference between two pairs of IEEE 754 double precision floating point numbers and fixed-point numbers

I have two pairs of IEEE 754 double precision (64-bit) floating-point numbers and unsigned fixed-point numbers, and I'm trying to find which pair has the greatest difference. The fixed-point numbers ...
Is converting between roots and coefficients of a polynomial numerically stable?

Assume we're on a computer using 32-bit floats (or something similar), and I'm converting back and forth between the $n$ coefficients of a polynomial and the corresponding $n$ roots of the polynomial. ...
storing decimal number into computer with finite mantissa

I am learning about numerical methods and the following link caught my attention: So from what I understand 0.1 is not exactly ...
Proof of loss of orthogonality in Gram-Schmidt

I am stuck at understanding about how to derive the following proofs related to error bounds which are given in the following slides. Can anyone please explain to me how these are derived?
fl(A) where A is a square matrix

We defined $fl(x)$ to be the function $fl:\mathbb{R} \rightarrow \mathbb R_b (t, s)$ (i.e., takes reals and outputs the float). What does $fl(A)$ mean when $A \in \mathbb R ^{n \times n} $? I assume ...
trouble understanding floating point representation

I had a quiz last week on floating point representation. After he graded the quiz, he walked us through each step so that we could see what we did wrong. I took notes so that I could study his ...
Determine The Base of The Venusian Numeration System [closed]

this question is from Thomas Koshy's book called "Discrete Mathematics With Applications": Any idea how to do this question? I can tell that the base of the system is at least 3 (since we ...
Fast computation of $x^{1/p}$, where $x\in\mathbb{R}^+$ and $p=2^{n}$, where $n\in\mathbb{N}$ with bit shifts?

There is plenty of literature regarding the legendary Fast inverse square root routine from Quake, but can we do something similar to compute $x^{1/p}$ as given in the title? Given that $p$ is a power ...
Algorithm for drawing generalised circles

A generalised circle is either a circle in the plane or a line. The general equation of one is: $$A(x^2 + y^2) + Bx + Cy + D=0,$$ where $4AD - B^2 - C^2 \leq 0$. This can be checked by completing the ...
Adding inverses of nilpotents as an extension of the "extended real numbers"

This is an idea that I had while playing with an automatic differentiation system built on dual numbers. This system, like most computer algebra systems built on floating point arithmetic, has the ...
Are there any ways to increase the precision in MATLAB without built in functions?

I am a beginner learning about MATLAB scientific computation, floating point numbers, and numerical error. When I am using a very small $x$ value for some equations, such as $y(x) = (\exp(x)-1-x)/x^2$,...
Explanation for MATLAB floating point number calculation?

I am a beginner studying scientific computation, more specifically floating point numbers and precision in matlab. When testing the outputs of 2 of the following equations, I am not sure how matlab ...
Round-Off Unit Formula [duplicate]

My textbook states the following: If $x\in \mathbb R$ such that $x_{\text{min}}\leq |x| \leq x_{\text{max}}$, then $$fl(x) = x(1+\delta) \text{ with } |\delta | \leq u$$ where $$u = \frac12 \beta^{1-...
