44
$\begingroup$

The Error function

$\mathrm{erf}(x)=\frac{2}{\sqrt{\pi}}\int_0^x e^{-t^2}\,dt$

shows up in many contexts, but can't be represented using elementary functions.

I compared it with another function $f$ which also starts linearly, has $f(0)=0$ and converges against the constant value 1 fast, namely

$\tanh{(x)} = \frac {e^x - e^{-x}} {e^x + e^{-x}}$.

Astoningishly to me, I found that they never differ by more than $|\Delta f|=0.0812$ and converge against each other exponentially fast!

I consider $\tanh{(x)}$ to be the somewhat prettyier function, and so I wanted to find an approximation to $\text{erf}$ with "nice functions" by a short expression. I "naturally" tried

$f(x)=A\cdot\tanh(k\cdot x^a-d)$

Changing $A=1$ or $d=0$ on it's own makes the approximation go bad and the exponent $a$ is a bit difficult to deal with. However, I found that for $k=\sqrt{\pi}\log{(2)}$ the situation gets "better". I obtained that $k$ value by the requirement that "norm" given by

$\int_0^\infty\text{erf}(x)-f(x)dx,$

i.e. the difference of the functions areas, should valish. With this value, the maximal value difference even falls under $|\Delta f| = 0.03$. And however you choose the integration bounds for an interval, the area difference is no more than $0.017$.

enter image description here

Numerically speaking and relative to a unit scale, the functions $\text{erf}$ and $\tanh{(\sqrt{\pi}\log{(2)}x)}$ are essentially the same.


My question is if I can find, or if there are known, substitutions for this non-elementary function in terms of elementary ones. In the sense above, i.e. the approximation is compact/rememberable while the values are even better, from a numerical point of view.

The purpose being for example, that if I see somewhere that for a computation I have to integrate erf, that I can think to myself "oh, yeah that's maybe complicated, but withing the bounds of $10^{-3}$ usign e.g. $\tanh(k\cdot x)$ is an incredible accurate approximation."

$\endgroup$
3

13 Answers 13

14
$\begingroup$

It depends on how much accuracy you need and over what interval. It seems that you are happy with a few percent. There is an approximation in Abromowitz & Stegun that gives $\text{erf}$ in terms of a rational polynomial times a Gaussian over $[0,\infty)$ out to $\sim 10^{-5}$ accuracy.

In case you care, in the next column, there is a series for erf of a complex number that is accurate to $10^{-16}$ relative error! I have used this in my work and got incredible accuracy with just one term in the sum.

$\endgroup$
5
  • $\begingroup$ Do you happen to know what the integrals of those approximations are (from negative to positive infinity)? I'm asking for the cases where we need to avoid letting the total area go over 1. $\endgroup$
    – user541686
    Commented Jan 24, 2014 at 5:06
  • 6
    $\begingroup$ Hello. Following your link to Abromowitz & Stegun, one can read that they borrowed those approximations from Hasting: Approximation for digital computers, but Hastings as well as A&S doen't provide any explanation, how to obtain those approximations. Do you happen to know how to do that or where this has been done? Thank you. $\endgroup$
    – Antoine
    Commented Jul 10, 2015 at 18:05
  • 1
    $\begingroup$ Hi, I'm a bit confused as, following the link, I am led to the homepage of a Prof. MacDonald, with no mention of a paper by Abromowitz & Stegun. Has the link changed, and if so could you provide an updated one? $\endgroup$
    – YiFan Tey
    Commented Nov 3, 2021 at 5:19
  • $\begingroup$ @YiFan There is a link to the book in the linked-to webpage $\endgroup$
    – Ron Gordon
    Commented Nov 15, 2021 at 20:35
  • $\begingroup$ @RonGordon thanks. No idea how I missed it the first time! $\endgroup$
    – YiFan Tey
    Commented Nov 16, 2021 at 0:04
10
$\begingroup$

A logistic distribution $F$ -- which can be expressed as a rescaled hyperbolic tangent -- can closely approximate the normal distribution function $\Phi$. Likewise, its inverse function -- the "logit" function $F^{-1}$ -- can be rescaled to approximate the inverse normal CDF -- the "probit" function $\Phi^{-1}$.

In comparison, the logistic distribution has fatter tails (which may be desirable). Whereas the normal distribution's CDF and inverse CDF ("probit") cannot be expressed using elementary functions, closed form expressions for the logistic distribution's CDF and its inverse are facilely derived and behave like elementary algebraic functions.

The logistic distribution arises from the differential equation $\frac{d}{dx}f(x) = f(x)(1-f(x))$. Intuitively, this function is typically used to model a growth process in which the rate behaves like a bell curve.

In comparison, the normal distribution arises from the following differential equation: $ \frac{d \,f(x)}{dx}=f(x)\frac{(\mu-x)}{\sigma^2}$). The normal distribution is commonly used to model diffusion processes. E.g., a Wiener processes is a stochastic process which has independent normally distributed increments with mean $\mu$ and variance $\sigma^2$. In the limit, this is a Brownian Motion.

Interestingly, the logistic distribution arises in a physical process which is analogous to Brownian motion. The "limit distribution of a finite-velocity damped random motion described by a telegraph process in which the random times between consecutive velocity changes have independent exponential distributions with linearly increasing parameters."

Note that the CDF of the logistic distribution $F$ can be expressed using hyperbolic tangent function:

$F(x;\mu ,s)={\frac {1}{1+e^{{-{\frac {x-\mu }{s}}}}}}={\frac 12}+{\frac 12}\;\operatorname {Tanh}\!\left({\frac {x-\mu }{2s}}\right)$

Given that distribution's variance is ${\tfrac {s^{2}\pi ^{2}}{3}}$, the logistic distribution can be scaled to approximate the normal distribution by multiplying its variance $\frac{3}{\pi ^2}$. The resultant approximation will have the same first and second moments as the normal distribution, but will be fatter tailed (i.e., "platykurtotic").

Also, $\Phi$ is related to the error function (and its complement) by: $\Phi (x)={\frac {1}{2}}+{\frac {1}{2}}\operatorname {erf} \left(x/{\sqrt {2}}\right)={\frac {1}{2}}\operatorname {erfc} \left(-x/{\sqrt {2}}\right)$

The chief advantage to approximating normal with the logistic distribution is that the CDF and Inverse CDF can be easily expressed using elementary functions. Several fields of applied science utilize this approximation.

The main disadvantage, however, is the estimation error. The maximum absolute error for the scaled logistic function and the normal CDF is $0.0226628$ for $X = \pm 0.682761$. Furthermore, the maximum errors of the inverse logistic function (logit) and the probit function are bounded at $.0802364$ in the region $[-0.841941,0.841941]$, but become asymptotically large outside that range. It is important to note these functions behave very differently in "the tails".


Thus, for a standard normal distribution with $\mu =0$ and $\sigma =1$: $$\operatorname{erf}(\frac{x}{\sqrt{2}}) \approx \operatorname{Tanh}\left(\frac{x \, \pi}{2 \sqrt{3}} \right) \equiv \frac{e^{\frac{\pi\,x}{\sqrt{3}}}-1}{e^{\frac{\pi\,x}{\sqrt{3}}}+1} $$

$$\operatorname{erf}(x) \approx \operatorname{Tanh}\left(\frac{x \, \pi}{ \sqrt{6}} \right) \equiv \frac{e^{\pi\,x\frac{2}{\sqrt{3}}}-1}{e^{\pi\,x\frac{2}{\sqrt{3}}}+1} $$

$$\Phi \left( x \right) \approx \frac{1}{2} + \frac{1}{2} \operatorname{Tanh} \left( \frac{\pi \, x}{2 \sqrt{3}} \right) $$

And easily, thus: $$x \mapsto \Phi^{-1}\left(p\right) \approx -\frac{2\sqrt{3}\operatorname{ArcTanh}\left( 1-2p \right)}{\pi}$$


Mathematica to analyze errors:

Define:

normsdistApprox[X_] = (1/2 + 1/2 Tanh[(\[Pi] X)/(2 Sqrt[3])]) 
normsinvApprox[p_] = X /. Solve[ p == normsdistApprox[X]  , X][[1]]
normalPDFApprox = D[normsdistApprox[X], X]

Plot:

Plot[{CDF[NormalDistribution[], X], normsdistApprox[X]}, {X, -5, 5}, 
 PlotLabel -> "Logistic Approximation to Normal CDF", 
 PlotLegends -> "Expressions", ImageSize -> 600]

Plot[{Abs[CDF[NormalDistribution[], X] - normsdistApprox[X] ]}, {X, 0,
   5}, PlotLabel -> 
  "Error of the Logistic Approximation to the Normal CDF", 
 ImageSize -> 600]

Plot[{InverseCDF[NormalDistribution[0, 1], p], normsinvApprox[p]}, {p,
   0, 1}, PlotLabel -> 
  "Logistic Approximation to the Inverse Normal CDF (Probit \
Function)", PlotLegends -> "Expressions", ImageSize -> 600]

Plot[{InverseCDF[NormalDistribution[0, 1], p] - 
   normsinvApprox[p]}, {p, 0, 1}, 
 PlotLabel -> 
  "Error of the Logit Approximation to the Inverse Normal CDF (Probit \
Function)", PlotLegends -> "Expressions", ImageSize -> 600]

enter image description here

enter image description here

enter image description here

enter image description here

Lastly, give max errors:

FindMaximum[Abs[CDF[NormalDistribution[], X] - normsdistApprox[X]], X]

FindMaximum[{Abs[
   InverseCDF[NormalDistribution[0, 1], p] - normsinvApprox[p]], 
  p >= 1*10^-2, p <= 1 - 1 *10^-2}, p]

FindMaximum[{Abs[
   InverseCDF[NormalDistribution[0, 1], p] - normsinvApprox[p]], 
  p >= 1*10^-16, p <= 1*10^-2}, p]

Out:

{0.0226628, {X -> 0.682761}}

{0.0802364, {p -> 0.841941}}

{12.032, {p -> 1.*10^-16}}
$\endgroup$
9
$\begingroup$

I suspect the reason the $\tanh x$ solution "works" so well is because it happens to be the second order Pade approximation in $e^x$. unfortunately, higher order Pade Approximations don't seem to work as well. One more thing you could due is try to approximate $\text{erf}(x)$ only on $(-3,3)$, and assume it to be $\pm 1$ everywhere else.

$\endgroup$
7
$\begingroup$

I pointed out this close correspondence in Section 2.4 of L. Ingber, ``Statistical mechanics of neocortical interactions. I. Basic formulation,'' Physica D 5, 83-107 (1982). [ URL http://www.ingber.com/smni82_basic.pdf ]

$\endgroup$
6
$\begingroup$

In addition to the answers above there are two things to note which may be of importance. Both relate to the fact that, depending on your application, the approximation may not be as good as as it looks. This might lead you to choose other approximations, like the ones already mentioned.

First, the tail behaviour of the $\mathrm{erf}$ and $\tanh$ functions is very different. Asymptotically $\mathrm{erf}$ behaves like $e^{-x^2}$, whereas $\tanh$ behaves like $e^{-x}$. Roughly speaking, a one in a 100 years event in the normal distribution becomes a one in 10 years event in the $\tanh$ approximation - this might matter.

Something else to note is that values of $\mathrm{erf}$ functions usually mean probabilities. But differences of probabilities are not meaningful quantities. Hence, depending on your application, another similarity measure may be more appropriate, for example the Kullback Leibler divergence $-\left[(p\log{q}+(1-p)\log(1-q)\right]$, where $p=\mathrm{erf}(\dots)$ and $q=\tanh(\dots)$.

As a sidenote: You can get similarly good approximations with fewer operations by stitching together two $e^{-x}$ functions, e.g. $f(x)=\mathrm{sgn(x)}\,e^{-\alpha|x|}$.

$\endgroup$
1
  • $\begingroup$ Yes. Divergence is especially important with small $x$. $\endgroup$
    – prime
    Commented Aug 28, 2017 at 21:10
2
$\begingroup$

Transforming a non-elementary function with elementary functions into a form that is well approximated with a rational polynomial seems to be a good approach.

Notice that $$ 1-\operatorname{erf}\left(x\right)^{2} \approx \frac{1}{e^{x^{2}}} $$ Now there are two approaches you can take to turn this into an approximately rational polynomial. Taking the log of both sides yields $$ \ln\left(1-\operatorname{erf}\left(x\right)^{2}\right) \approx -x^{2} $$ Substituting $x\rightarrow \sqrt{\ln\left(x\right)}$ yields $$ 1-\operatorname{erf}\left(\sqrt{\ln\left(x\right)}\right)^{2} \approx \frac{1}{x} $$ Both of these transformations of erf are invertible. Finding the Pade Approximant for either of these will yield an astoundingly good approximation for very few terms. In the case of the first one $$ \ln\left(1-\operatorname{erf}\left(x\right)^{2}\right) \approx x^{2}\frac{-1.27324-0.074647x^{2}}{1+0.0886745x^{2}} $$ and so for $x \ge 0$ $$ \operatorname{erf}\left(x\right)\approx \sqrt{1-e^{x^{2}\frac{-1.27324-0.074647x^{2}}{1+0.0886745x^{2}}}} $$ whose accuracy is about 4 decimal places at worst, and converges at 0 and infinity.

Unfortunately this is not invertible.

$\endgroup$
1
$\begingroup$

I found my own very compact and nice but most importantly readily reversible $\operatorname{erf}(x)= \frac2\pi\arctan[2x(1+x^4)]$ (error below 2%).

$\endgroup$
1
$\begingroup$

I like the tanh approximation given, but, I noticed a possible correction you could use. If you look at the plot of the error of the approximation with erf, it's almost a damped sine wave in the form $-e^{-\lambda x} \sin({\alpha \pi x})$. My tweaked version of your approximation is $erf(x)\approx tanh(kx)-Ce^{-\lambda x} \sin({\alpha \pi x})$, with the parameters $k=\sqrt\pi \ln{2}$, $C=0.01$, $\lambda = 0.25$, $\alpha = \frac{0.975}{0.1+\sqrt[3] {|x|}}$ (the addtion in the denomination is obvious, to prevent division by zero, it can be made larger or smaller to adjust the fit, and the root can likewise be changed to adjust the fit, however, it should be the root of the absolute value of x to prevent using imaginary numbers where they're not needed, and to maintain the sign of $\pi x$)

Unfortunately, my fit is still far from perfect, I simply eyeballed the additional approximation, and hope someone will adjust the parameters for a better fit, probably at least an additional $C' e^{\lambda' x} \sin({\beta \pi x})$ calculation will be necessary to better smooth the fit.

I hope my input was helpful, although I'm pretty late to the party!

$\endgroup$
1
  • $\begingroup$ Thanks for the answer. Btw. both tanh and erf should also be TeX functions. In any case, it's been over 5 years since I was interested in this, so lala :) $\endgroup$
    – Nikolaj-K
    Commented May 2, 2019 at 22:03
1
$\begingroup$

Too long for a comment.

The term $k_0=\sqrt \pi \log(2)$ is really elegant.

If we consider the norm $$\Phi(k)=\int_0^\infty \Big[\text{erf}(x)-\tanh (k x)\Big]^2\,dx$$ which corresponds to a least squares regression based on an infinite number of data points, we get a slightly different optimum value, which, rationalized, is $k_1=\frac{605}{503}$ (which is more than trivial). $$\Phi(k_0)=5.28 \times 10^{-4} \qquad \qquad \qquad \Phi(k_1)=4.44 \times 10^{-4}$$ In terms of maximum absolute errors, $k_0$ leads to $0.027$ while $k_1$ leads to $0.019$.

$\endgroup$
1
$\begingroup$

Nice! I found one very efficient one with a maximum difference of 0.01747. It contains a grand total of 1 division, 1 addition, 1 power, and 2 absolutes

$$ aerf(x)=\frac x {|x|+0.187^{|x|}} $$

For reference: $$ d(x)=aerf(x)-erf(x) $$

And you can reverse it (using more than elementary functions (courtesy of njuffa)): $$ \text{But that's hard.} $$

The 3 most off values of x for peaks (less on both sides) are:

  1. 0.5982 x, 0.01747 more
  2. 1.5582 x, 0.01741 less
  3. 0.11439 x, 0.00681 less

It is 0.01251 less at an x of 2, 0.000045 less at 5, and $5.22 (10^{-9})$ less at 10.

As x approaches infinity or 0, $d(x)$ approaches 0.

$$ \frac{0}{0+0.187^0}=\frac{0}{1}=0 $$ $$ \lim_{x\rightarrow inf}\frac{x}{|x|+0.187^{|x|}}=1 $$

re: njuffa says that a slightly more accurate equation uses $\frac{289}{1545}$ which is $5.501*10^{-5}$ more than $0.187$ and well, it is more accurate, but only by ~0.00004 at the greatest difference, so it's not really that much.

If you don't know, $erf'(x)$ is a scaled normal distribution ($e^{-\pi x^2}$). $$erf'(\sqrt{\pi}x)\frac{\sqrt{\pi}}2=e^{-\pi x^2}$$

If you want accuracy approximating the normal distribution, you would use $$anorm(x) = \frac{\delta}{\delta x}\frac{x}{|x|+0.368^{|x|}}$$

$\endgroup$
2
  • 1
    $\begingroup$ Slightly more accurate: $(\frac{289}{1545})^{|x|}$. FWIW, I don't think this is reversible using only elementary math functions. $\endgroup$
    – njuffa
    Commented Nov 3, 2021 at 9:29
  • $\begingroup$ i didn't really want to go for A B S O L U T E P R E C I S I O N so i stuck with 3 decimal places $\endgroup$ Commented Nov 4, 2021 at 18:41
0
$\begingroup$

There is this series expansion using Gamma Regularized function $Q(a,z)$:

$$1-Q\left(n+\frac12,z^2\right)\mathop=_{n\in\Bbb N}^{z\ge0}\text{erf}(z)-ze^{-z^2}\frac{(-1)^n}{\Gamma\left(n+\frac12\right)}\sum_{k=0}^{n-1}\left(\frac12-n\right)_{n-k-1}(-1)^k z^{2k}\iff\text{Error of Approximation}=\text{erf(z)}-\text{Approximation}$$

You can invert to solve within the range of the approximation error:

$$\text{Error of Approximation}(z)=x\implies z=\sqrt{Q^{-1}\left(n+\frac12,1-x\right)}$$

An application of this Inverse Gamma Regularized $Q^{-1}(a,z)$ formula is to find the value of $z$ where the error of the approximation of $\text{erf}(z)$ is $0\le x\le1$. In other words, you can find which values of $z$ needed to be some amount off from the approximation of the Error function. Please correct me and give me feedback!

$\endgroup$
0
$\begingroup$

Late comer to this question but I know several solutions that may be of interest, including one that I am in the process of publishing. You seek an approximation to erf(x) that is "compact/rememberable" in the context of integrating a function presumably involving erf(x). I have been looking at just such a problem and the solution I came up with involves only exponentials of quadratic functions of the form $$ {\rm erf}_M(x)=1-\sum_{i=1}^{M} c_i {\rm e}^{-a_i x^2+2b_i x},~x\in R $$ where $a_i$, $b_i$ and $c_i$ for i=1...M are parameters that are used to fit the function erf(x). I call this an exponential quadratic approximator (EQA) for the error function. Note that the approximation can also be used for negative arguments since erf(x) is an odd function, which makes it applicable to the entire real line.

In the paper, I give an example for $M=4$ that yields a maximum absolute error of around $1.65 \times 10^{-4}$ on the entire real line. The 12 parameters are obtained by gradient descent optimisation. The values that give the cited error are given in Table 1:

enter image description here A preprint of the paper is available here: A 4-Term Exponential-Quadratic Approximation for Gaussian Q or Error Functions.

Comparison of approximations to the error function

The paper also reviews a number of other approximations from other researchers, although they are typically for the Gaussian Q-function $Q(x)=\frac{1}{2}(1-{\rm erf}(x/\sqrt{2}))$ and accurate for large $x$ but not near $x=0$, whereas the aforementioned EQA is exact at $x=0$. In the above figure, the 4-term EQA (dashed cyan) lies atop the error function (in black) on the graph. The other approximators, some of which are also based on exponentials of quadratic arguments, are detailed in the paper.

The approximation is particularly useful for analytical approximation (in terms of elementary functions) of integrals of the form: $$ \int_0^{\infty}{\rm erf}^n(ax + b)\,x^r\,{\rm exp}(-f^2 x^2 + 2gx)\,dx $$ where $r\geq 0$ and $n>0$ are both integers and $a>0$. Such integrals arise in the development of quasi-analytical models of least-squares generative adversarial networks (GANs) as described in Convergence and Optimality Analysis of Low-Dimensional GANs using Error Function Integrals.

$\endgroup$
-1
$\begingroup$

And another one not so nice but reversible by $\ln(x)$ to depressed quartic equation solvable by Ferrari's formulas and more accurate $\operatorname{erf}(x)= \operatorname{sgn}(x) \tanh[1.152|x| + 0.064|x|^4]$ (error below 0.5%).

$\endgroup$

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .