2
$\begingroup$

I am trying to apply Assumed Density Filtering (ADF) according to the paper Lightweight Probabilistic Deep Networks to my own model, and I need to implement the variational approximation layer of Sigmoid and SiLU function.

I tried to look for the equations for Sigmoid layer. In the paper Variational Learning in Nonlinear Gaussian Belief Networks, the authors mentioned that they have a closed form solution for calculating the expected value for Sigmoid layer with the equation:

$$ M(μ,σ) = Φ(\frac{μ}{\sqrt{1+σ^2}}) $$

However, according to this question, there is only an approximation solution. Did I miss out some assumptions from the paper or misunderstood either of them?

For SiLU, I am unable to find out resources for it so far. Would appreciate if anyone could provide some guidance or point me to some resources for it.

$\endgroup$
1
  • $\begingroup$ Reading the Frey-Hinton paper , on p. 19 under "Sigmoidal units" they say "the cumulative Gaussian squashing function" and they write $f(x)=\phi(x)$ (in their notation $\phi(x)$ is the normal density function). Have no idea what they use for $f(x)$ but its definitely not the sigmoid. $\endgroup$
    – Ted Black
    Commented Mar 19 at 11:15

1 Answer 1

1
$\begingroup$

This is a long comment

$\def\qty#1{\left( #1 \right)}$ In the Frey & Hinton paper they use the approximation $\sigma(x)\approx \Phi(x)$. Then, $$ M(\mu,\sigma) \approx \frac{1}{\sigma} \int_{-\infty}^\infty \Phi(x) \phi\qty{\frac{x-\mu}{\sigma}}dx $$ This can be rewritten as, $$ M(\mu,\sigma) \approx \int_{-\infty}^\infty \Phi(\sigma z + \mu) \phi\qty{z}dx $$ which evaluates to, $$ M(\mu,\sigma) \approx \Phi \qty{\frac{\mu}{\sqrt{1+\sigma^2}}} $$ For the SiLU activation function, the corresponding approximation is, $$ M(\mu,\sigma) \approx \frac{1}{\sigma} \int_{-\infty}^\infty x\Phi(x) \phi\qty{\frac{x-\mu}{\sigma}}dx $$ which can be rewritten as, $$ M(\mu,\sigma) \approx \int_{-\infty}^\infty (\sigma z + \mu)\Phi(\sigma z + \mu) \phi\qty{z}dz $$ First, $$ \int_{-\infty}^\infty z \Phi(\sigma z + \mu) \phi\qty{z}dz $$ evaluates to, $$ \frac{\sigma}{\sqrt{1+\sigma^2}} \phi\qty{\frac{\mu}{\sqrt{1+\sigma^2}}} $$ Next, $$ \int_{-\infty}^\infty \Phi(\sigma z + \mu) \phi\qty{z}dz = \Phi \qty{\frac{\mu}{\sqrt{1+\sigma^2}}} $$ So for the SiLU activation function, $$ M(\mu,\sigma) \approx \frac{\sigma^2}{\sqrt{1+\sigma^2}} \phi\qty{\frac{\mu}{\sqrt{1+\sigma^2}}} + \mu \Phi \qty{\frac{\mu}{\sqrt{1+\sigma^2}}} $$

$\endgroup$
3
  • $\begingroup$ Thanks for your derivation and explanation. Also, thanks for pointing out that the function is actually not Sigmoid but the CDF of normal. For those who wish to know more about it can refer to this paper Variational inference for continuous sigmoidal Bayesian networks. Also, since this is not actually Sigmoid, I have done more research and came across this paper Towards fully Bayesian Neural Networks. It has both the approximation solutions for the expected value and variance of Sigmoid. $\endgroup$
    – Mr Amoeba
    Commented Mar 20 at 6:34
  • $\begingroup$ My apologies, the paper also uses CDF of normal for approximation. $\endgroup$
    – Mr Amoeba
    Commented Mar 20 at 9:18
  • $\begingroup$ @Mr Amoeba thanks for attaching a copy of the paper by Huber. $\endgroup$
    – Ted Black
    Commented Mar 20 at 10:13

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .