4
$\begingroup$

Investigating the floating-point implementation of the $\operatorname{pow}(x,b)=x^b$ with $x,b\in\Bbb R$ in some library implementations, I found that some pow implementations are inexact.

The implementation is basically as $$ \operatorname{pow}(x,b)= \exp(b\ln x) \tag 1 $$ Now let's assume $x>0$, and that the relative error inflicted by $\exp$ and $\ln$ is the same $\Delta$ with $|\Delta| \ll 1$

Then the result of the computation is: $$\begin{align} (1+\Delta_\text{pow}) \operatorname{pow}(x,b) &= (1+\Delta)\exp (b(1+\Delta)\ln x) \\ &=(1+\Delta)\exp(b\ln x)\underbrace{\exp(\Delta\cdot b\ln x)}_{\textstyle\approx 1+\Delta\cdot b\ln x} \\ &\approx (1+\Delta)\operatorname{pow}(x,b)(1+\Delta\cdot b\ln x) \\ &\approx (1+\Delta + \Delta\cdot b\ln x)\operatorname{pow}(x,b) \tag 2 \end{align}$$ This means that the relative error of pow is not $\Delta$ but $$ \Delta_\text{pow} = \Delta(1+|b\ln x|) \tag 3 $$ So it is clear why the implementation according to $(1)$ is getting inexact, but I don't see how pow could be implemented with a smaller relative error. (The targeted precision is relative because it is in the context of floating-point imlpementation).

I am not asking for a specific implementation of course, but for a different representation of pow that has a better error behaviour. In the above implementation, the relative error of pow is unbounded even when the relative errors of exp and ln are bounded and well-behaved.


For example, in IEEE double the relative error of exp and ln is in the order of magnitude of $$ \Delta \approx 10^{-16} $$ so that the $e^t\approx 1+t$ approximation works well for a wide range of $t=\Delta\cdot b\ln x$, e.g. when $|b\ln x| = 1000 \approx 2^{10}$ then the loss of precision of the pow implementation is 10 bits.

$\endgroup$
6
  • $\begingroup$ What do you mean by "inexact"? You mean the error is too large? There have been reports of broken implementations of trigonometric functions in major software packages. $\endgroup$
    – shuhalo
    Commented Jun 8 at 19:57
  • $\begingroup$ Your error will be large if $|b \ln x|$ is large, but then the approximation $\exp( \Delta b \ln(x) ) \approx 1 + \Delta b \ln(x)$ will be a bad underestimate. $\endgroup$
    – shuhalo
    Commented Jun 8 at 20:00
  • $\begingroup$ For a working IEEE double implementation (which is the case for exp and ln), $\Delta$ is in the order of $10^{-16}$. So that for most practical purposes, it's still the case that $|\Delta b\ln x| \ll 1$. For example with $|b\ln x| = 1000$, the implementation of pow loses 10 LSBs in relative precision. $\endgroup$ Commented Jun 8 at 20:25
  • $\begingroup$ @njuffa It's clear that bumping the precision works, however I am on a very limited embedded system, and it's not actually feasible to re-implement exp, log and mul with higher precision. Moreover, there is no upper bound for $|b\ln x|$, though it would be some partial remedy. I just thought I had overlooked something obvious. $\endgroup$ Commented Jun 11 at 10:28
  • $\begingroup$ @njuffa The implementation is in asm: github.com/avrdudes/avr-libc/tree/main/libm/fplib I'll have a look whether it is feasible. Also exp and log in that implementation are not as precise as could, but it's hard to do it within the same codesize / speed constraints. $\endgroup$ Commented Jun 11 at 11:00

2 Answers 2

1
$\begingroup$

The issue is fundamental and if we are limited to working precision then there is nothing that can be done to avoid it.

You are computing a function $f$ of two variables, i.e., $$y = f(b,x) = x^b.$$ Let $\hat{y}$ denote the computed value of $y$. In general, we cannot hope to compute $f$ exactly and $y \not =\hat{y}$. If we are exceedingly skilled, then our implementation of $f$ is componentwise relative backwards stable and there exists $\bar{b}$ and $\bar{x}$ such that $$\hat{y} = f(\bar{b}, \bar{x})$$ and $$| b - \bar{b} | \leq Cu|b|, \quad |x - \bar{x}| \leq C u |x|.$$ Here $u$ is the unit roundoff and $C > 0$ is a constant that is independent of $u$. The size of $C$ reflects our skill as computer programmers. This is the very best that we can hope to achieve on a computer because the basic arithmetic operations satisfy conditions of this type. Does it matter? Yes. We care about the size of relative forward error, i.e., $$R = \frac{y - \hat{y}}{y}.$$ The size of this relative error is controlled by the product of the componentwise relative backward error and the componentwise relative condition number $\kappa_f^{\text{cr}}(x)$, read this detailed answer to a related question as well as this answer to a simpler problem which nevertheless encapsulates the present difficulties. In your case, we have $$ \frac{\partial f}{\partial b} = x^b \log(x)$$ and $$ \frac{\partial f}{\partial x} = b x^{b-1}.$$ It follows that $$ \kappa_f^{\text{cr}}(x) = |b|(1+|\log(x)|)$$ We conclude that smallest relative forward error that we can hope to achieve satisfies $$|R| \leq C u |b|(1 + |\log(x)|).$$ Note that this bound is even more pessimistic that your original analysis because here I cannot assume that the relative errors for $\log$ and $\exp$ are uniformly bounded over their natural domain. If this is achieved by a library, then it is emulating a smaller unit roundoff.

In conclusion, your current implementation is about as accurate as we can hope unless we start emulating a smaller unit roundoff.

$\endgroup$
0
$\begingroup$

You may be able to improve on the error performance by replacing

$$ \operatorname{pow}(x,b)= \exp(b\ln x) \tag 1 $$

with

$$ \operatorname{pow}(x,b)= x^{Int(b)}\exp(Frac(b)\ln x) \tag 1 $$

where $Int(b)$ and $Frac(b)$ are the integer and fractional parts of b, and $ x^y $ is achieved by multiplying $x$ by itself $y$ times.

On the assumption that the error induced by multiplication is much less than that inherent in the floating-point implementation of $pow(x,b)$, you would reduce your error to approximately $$ \Delta_\text{pow} = \Delta(1+|Frac(b)\ln x|) \tag 3 $$

$\endgroup$

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .