Skip to main content

Questions tagged [floating-point]

Mathematical questions concerning floating point numbers, a finite approximation of the real numbers used in computing.

4 votes
2 answers
110 views

pow and its relative error

Investigating the floating-point implementation of the $\operatorname{pow}(x,b)=x^b$ with $x,b\in\Bbb R$ in some library implementations, I found that some pow ...
emacs drives me nuts's user avatar
6 votes
0 answers
143 views

Algebraic Structures involving π™½πšŠπ™½ (absorbing element).

IEEE 754 floating point numbers contain the concept of π™½πšŠπ™½ (not a number), which "dominates" arithmetical operations ($+,-,β‹…,Γ·$ will return ...
Hyperplane's user avatar
  • 11.8k
3 votes
0 answers
53 views

Solve $10^{10^z} = 10^{10^x}+10^{10^y}$ for $z$ with floating point accuracy

In the following equation $$10^{10^z} = 10^{10^x}+10^{10^y}$$ I want to find an algorithm that computes $z$ in a floating point accurate manner given any values of $x$ and $y$ (e.g. $x=y=2000$). The ...
Gerben Beintema's user avatar
1 vote
2 answers
64 views

How to transform this expression to a numerically stable form?

I have this function $$f(x, t)=\frac{\left(1+x\right)^{1-t}-1}{1-t}$$ Where $x \ge 0$ and $t \ge 0$. I want to use it in neural network, and thus need it to be differentiable. While it has a ...
yuri kilochek's user avatar
1 vote
0 answers
49 views

Proof that $\epsilon_{mach} \leq \frac{1}{2} b^{1-n}$

I have a question about the proof of the following statement: For each set of machine numbers $F(b, n, E_{min}, E_{max})$ with $E_{min} < E_{max}$ the following inequality holds: $\epsilon_{mach} \...
Felix Gervasi's user avatar
1 vote
0 answers
58 views

Why does TI-84 show scientific notation for zeros sometimes but not others?

When graphing a function and then going through the process to calculate the zeroes (left bound, right bound, guess), is there a reason that sometimes it shows y = 0, but there are other times when it ...
mmmmmm's user avatar
  • 141
2 votes
1 answer
73 views

Numerically stable way to compute ugly double fraction

I am looking for a numerically stable version of this (ugly) equation $$ s^2=\frac{1}{\frac{1}{\beta_1}+\frac{1}{\beta_2}W} $$ where $$ \beta_1 = c_1-c_2m+(m-c_2)b\\ \beta_2 = \frac{1}{2}\left((a-m)^2-...
mto_19's user avatar
  • 272
1 vote
0 answers
83 views

Fundamental Axiom of Floating Point Arithmetic for Complex Numbers Multiplication

I am trying to prove the fundamental axiom of floating point arithmetic also applies to complex number multiplication. First, let $fl$ be a function that maps a number to its closest floating point ...
Gu Bochao's user avatar
1 vote
1 answer
140 views

How do calculators represent floating points (somewhat) perfectly?

If you ask a programming language to calculate 0.6 + 0.7, you’ll get something like 1.2999998, and that’s because of how floating point numbers are represented in computers. But if you ask a ...
Samathan's user avatar
0 votes
0 answers
29 views

Calculating coordinates of vertices, given dimensions in an architectural floorplan

So, one of my friend is trying to learn autocad. They were given a floorplan. The floorplan had the dimensions. And they were asked to find the coordinates of the all the vertices of the plan. So we ...
user3851878's user avatar
3 votes
2 answers
87 views

Proof that $\frac 1{10}$ has no finite binary float representation

I am supposed to prove that $\frac 1{10}$ is not representable as a finite binary float. I tried proving this via induction but that did not seem to work, now I am out of ideas. Thank you
trapaholicsmixtapes's user avatar
1 vote
2 answers
64 views

Absolute difference between largest IEEE754 number and its predecesor

In simple precision format, the largest possible positive number is $A = 0 ~~~ 11111110 ~~~ 111\ldots 111$ Its predecessor is $B = 0 ~~~ 11111110 ~~~ 111 \ldots 110$ But what is the absolute ...
lafinur's user avatar
  • 3,468
0 votes
1 answer
59 views

Proof of `TWOSUM` implementation in "double-double" arithmetic

"double-double" / "compensated" arithmetic uses unevaluated sums of floating point numbers to obtain higher precision. One of the basic algorithms is ...
Claude's user avatar
  • 5,707
0 votes
0 answers
10 views

Specify the conditions Exponent and Mantissa sizes must meet, so that the minimal distance between representable numbers is no more than 1.

Using the following floating-point representation: s - one sign bit m - mantissa - real number in range [1, 2), in which 1 and the comma are skipped, size of M bits c - Exponent - natural number, ...
Artur Bieniek's user avatar
0 votes
0 answers
53 views

What is the computational complexity of calculating determinants for matrices of finite precision floating-point numbers?

Following up from this older question, I understand that calculation of determinants for integer-valued matrices is possible with polynomial scaling. However, I have been unable to locate any ...
KarimAED's user avatar
0 votes
1 answer
55 views

How to compute the successor to a given floating point number

Let $F$ the set of all floating point number $n2^e$ such that $ -2^{53} < n < 2^{53}$ and $βˆ’1074 \leq e \leq 970$. Let $F^* = F - \{\max(F)\}$ I assume $F$ not to be dense, and therefore there ...
NRagot's user avatar
  • 57
4 votes
1 answer
151 views

Is there still a fast invsqrt magic number for float128?

...
steve02081504's user avatar
1 vote
0 answers
159 views

Show that $x+1$ is not backward stable

Suppose we use $\oplus$ to compute $x+1$, given $x \in \mathbb{C}$. $\widetilde{f(x)} = \mathop{\text{fl}}(x) \oplus 1$. This algorithm is stable but not backward stable. The reason is that for $x \...
clay's user avatar
  • 2,783
1 vote
2 answers
173 views

Another way to compute the epsilon machine

Why the next program computes the machine precision? I mean, it can be proved that the variable $u$ will give us the epsilon machine. But I don't know the reason of this. Let $a = \frac{4}{3}$ $b = a βˆ’...
xenuti's user avatar
  • 153
3 votes
0 answers
152 views

Justification for the definition of relative error, why is it not a metric?

The absolute error and relative error operators are very commonly encountered while reading about topics from the fields of floating-point arithmetics or approximation theory. Absolute error is ${ae(a,...
user2373145's user avatar
0 votes
2 answers
99 views

Tricks in the floating point operations for better numerical results

I'm attempting to comprehend a passage from the book "Computational Modeling and Visualization of Physical Systems with Python" which I may be mentally fatigued to grasp. Here's the issue: ...
Fitzroy's user avatar
  • 15
2 votes
1 answer
182 views

Is there a stable algorithm for every well-conditioned problem?

Reading these notes on condition numbers and stability, the summary states: If the problem is well-conditioned then there is a stable way to solve it. If the problem is ill-conditioned then there is ...
Thanks for flying Vim's user avatar
0 votes
0 answers
27 views

Floating Point Precision Algorithm

In my database, data stored as a precision of 10 digits Decimal(30,10). User can enter x or 1/x. I need to save in 1/x. If user enters ...
Imran Qadir Baksh - Baloch's user avatar
0 votes
0 answers
60 views

Secant method optimization - initial guesses with floating point precision?

Say I want to find the root of $f(x) = e^{-x} - 5$, and assume I start with initial guesses $x_0 = -3$ and $x_1 = 3$. I define my update function as $x_i = x_{i-1} - f(x_{i-1}) * \frac{x_{i-1} - x_{i-...
rb612's user avatar
  • 3,588
1 vote
1 answer
172 views

Does using smaller floating-point numbers decrease rounding errors?

I started learning about floating point by reading "What Every Computer Scientist Should know About Floating-Point Arithmetic" by David Goldberg. On page 4 he presents a proof for the ...
Thanks for flying Vim's user avatar
1 vote
0 answers
22 views

How to calculate converted value for each number in a set using a conversion rate, having its sum equal exactly a rounded fixed converted total?

Say I have three numeric values: a total, converted total, and a conversion rate. These are fixed, given numbers, and the two totals always have the precision of two decimal places. ...
Dr. Barry's user avatar
0 votes
0 answers
56 views

Finding an expression for $\sqrt{x^2 + z^2}$ that is more precise in floating point arithmetic?

Assuming that both $x$ and $z$ have no representation errors, and that $\vert z^2 \vert \ll \vert x^2 \vert$. There must exist an expression for $\sqrt{x^2 + z^2}$ that is the same in exact arithmetic ...
ADFjemamski's user avatar
0 votes
0 answers
38 views

How does a computer calculate matrix scalar multiplication order of operations (flops)

I am trying to understand the number of flops in the Householder QR factorization. In one line of the algorithm, it says \begin{gather*} v = v / \lVert v \rVert_2 \end{gather*} I was wondering what ...
pongdini's user avatar
  • 121
1 vote
1 answer
168 views

On the axioms of floating-point arithmetic

As I understand there are two "axioms" that should be satisfied in floating-point arithmetic: $$\forall x\in \mathbb R,\ \exists |\varepsilon|\leq\varepsilon_{\text{machine}},\ \mbox{fl} (x) ...
JuliΓ‘n's user avatar
  • 1,347
0 votes
1 answer
60 views

Representation of rounding error in floating point arithmetic. [duplicate]

It is well known that in a Floating point number system: $$ \mathbb{F}:=\{\pm \beta^{e}(\frac{d_1}{\beta}+\dots +\frac{d_t}{\beta^t}): d_i \in \{0,\dots,\beta-1\},d_1\neq 0, e_{\min}\leq e \leq e_{\...
Henry T.'s user avatar
  • 1,356
1 vote
0 answers
34 views

Expression of sum in floating point system

This is a question of an exam on Numerical Analysis I had: Consider the floating point system of base $2$, maximum number of decimals $53$, maximum exponent $1025$ and minimum exponent $-1022$. That ...
Little Jonny's user avatar
0 votes
2 answers
164 views

Evaluating $a(b + c)$ more accurately with FMA

I'm using machine-precision floating-point arithmetic, and every so often it happens that I need to evaluate an expression of the form $a(b + c)$. I found that the accuracy can be improved using FMA (...
user2373145's user avatar
2 votes
0 answers
82 views

Numerically stable evaluation of factored univariate real polynomial

Suppose we have a real univariate factored polynomial, meaning we have its factors: an arbitrary number of polynomials of degree less than or equal to two. To simplify things, if necessary, let's ...
user2373145's user avatar
0 votes
0 answers
104 views

Bias in Single Precision Floating numbers

I had a doubt regarding Single Precision Floating point numbers. It is about the bias number which can be derived from exponent part of this representation of numbers. On searching up on google, most ...
crimsonKnight's user avatar
3 votes
1 answer
109 views

How to compute this "smooth max operator"?

I was seeking for an alternate way to activate each neuron of a neural network non-linearly. Eventually, I came up with the following binary operation: $$ x \lor y = \log (\exp x + \exp y) $$ With $-\...
Dannyu NDos's user avatar
  • 2,049
0 votes
0 answers
25 views

How to Multiply 2 arrays with unique non-integers to prodice an array with unique results?

Is there an Algortihm/formulae to multiply two arrays (1D & 2D) of unique numbers such that the resultant array contains unique results. Would one have to create the 2 initial arrays in a certain ...
D.Price's user avatar
1 vote
1 answer
137 views

What is the set of all numbers that can be represented with a floating-point format?

Computers use single- (or, for more precise calculations, double-) precision floating-point formats to represent a subset of real numbers. While a decent chunk of real numbers can be stored with these ...
SMMH's user avatar
  • 313
1 vote
0 answers
54 views

Method for finding the largest positive difference between two pairs of IEEE 754 double precision floating point numbers and fixed-point numbers

I have two pairs of IEEE 754 double precision (64-bit) floating-point numbers and unsigned fixed-point numbers, and I'm trying to find which pair has the greatest difference. The fixed-point numbers ...
Polynomial's user avatar
0 votes
0 answers
44 views

Is converting between roots and coefficients of a polynomial numerically stable?

Assume we're on a computer using 32-bit floats (or something similar), and I'm converting back and forth between the $n$ coefficients of a polynomial and the corresponding $n$ roots of the polynomial. ...
chausies's user avatar
  • 2,230
0 votes
1 answer
53 views

storing decimal number into computer with finite mantissa

I am learning about numerical methods and the following link caught my attention: https://www.iro.umontreal.ca/~mignotte/IFT2425/Disasters.html So from what I understand 0.1 is not exactly ...
neo's user avatar
  • 109
0 votes
0 answers
66 views

Proof of loss of orthogonality in Gram-Schmidt

I am stuck at understanding about how to derive the following proofs related to error bounds which are given in the following slides. Can anyone please explain to me how these are derived?
ThickThighs's user avatar
-1 votes
1 answer
65 views

fl(A) where A is a square matrix

We defined $fl(x)$ to be the function $fl:\mathbb{R} \rightarrow \mathbb R_b (t, s)$ (i.e., takes reals and outputs the float). What does $fl(A)$ mean when $A \in \mathbb R ^{n \times n} $? I assume ...
hayev10063's user avatar
-3 votes
1 answer
152 views

trouble understanding floating point representation

I had a quiz last week on floating point representation. After he graded the quiz, he walked us through each step so that we could see what we did wrong. I took notes so that I could study his ...
shoebox's user avatar
-1 votes
1 answer
66 views

Determine The Base of The Venusian Numeration System [closed]

this question is from Thomas Koshy's book called "Discrete Mathematics With Applications": Any idea how to do this question? I can tell that the base of the system is at least 3 (since we ...
VirgOpta's user avatar
0 votes
1 answer
76 views

Fast computation of $x^{1/p}$, where $x\in\mathbb{R}^+$ and $p=2^{n}$, where $n\in\mathbb{N}$ with bit shifts?

There is plenty of literature regarding the legendary Fast inverse square root routine from Quake, but can we do something similar to compute $x^{1/p}$ as given in the title? Given that $p$ is a power ...
Bobby's user avatar
  • 41
0 votes
2 answers
76 views

Algorithm for drawing generalised circles

A generalised circle is either a circle in the plane or a line. The general equation of one is: $$A(x^2 + y^2) + Bx + Cy + D=0,$$ where $4AD - B^2 - C^2 \leq 0$. This can be checked by completing the ...
wlad's user avatar
  • 8,215
2 votes
0 answers
61 views

Adding inverses of nilpotents as an extension of the "extended real numbers"

This is an idea that I had while playing with an automatic differentiation system built on dual numbers. This system, like most computer algebra systems built on floating point arithmetic, has the ...
Mike Battaglia's user avatar
0 votes
0 answers
613 views

Are there any ways to increase the precision in MATLAB without built in functions?

I am a beginner learning about MATLAB scientific computation, floating point numbers, and numerical error. When I am using a very small $x$ value for some equations, such as $y(x) = (\exp(x)-1-x)/x^2$,...
cronk's user avatar
  • 31
3 votes
2 answers
156 views

Explanation for MATLAB floating point number calculation?

I am a beginner studying scientific computation, more specifically floating point numbers and precision in matlab. When testing the outputs of 2 of the following equations, I am not sure how matlab ...
cronk's user avatar
  • 31
0 votes
1 answer
68 views

Round-Off Unit Formula [duplicate]

My textbook states the following: If $x\in \mathbb R$ such that $x_{\text{min}}\leq |x| \leq x_{\text{max}}$, then $$fl(x) = x(1+\delta) \text{ with } |\delta | \leq u$$ where $$u = \frac12 \beta^{1-...
Lex_i's user avatar
  • 2,072

15 30 50 per page
1
2 3 4 5
…
10