I am trying to apply Assumed Density Filtering (ADF) according to the paper Lightweight Probabilistic Deep Networks to my own model, and I need to implement the variational approximation layer of Sigmoid and SiLU function.
I tried to look for the equations for Sigmoid layer. In the paper Variational Learning in Nonlinear Gaussian Belief Networks, the authors mentioned that they have a closed form solution for calculating the expected value for Sigmoid layer with the equation:
$$ M(μ,σ) = Φ(\frac{μ}{\sqrt{1+σ^2}}) $$
However, according to this question, there is only an approximation solution. Did I miss out some assumptions from the paper or misunderstood either of them?
For SiLU, I am unable to find out resources for it so far. Would appreciate if anyone could provide some guidance or point me to some resources for it.