I'm studying KDE and got trouble understanding Scott's rule or Silverman's rule for bandwidth selection.
I saw that the optimal bandwidth is the value that minimizes Mean Integrated Squared Error (MISE).
$MISE = \int E(\hat f(x)-f(x) )^2 d{x} $
But MISE formula can't be used directly since they involve the unknown, real density function $f(x)$.
Therefore, Scott's or Silverman's rule of thumb assumes Gaussian distribution for the unknown density $f(x)$ in order to find optimal bandwidth.
My doubt/question is:
Non-parametric method like KDE is a distribution-free method, which do not rely on assumptions that the data are drawn from a given parametric family of probability distributions. But since Scott or Silverman's rule assumes Gaussian distribution for the unknown density $f(x)$, it seems like a contradiction for me.
Does Scott or Silverman's rule assumes that the kernel function to estimate density is also Gaussian?
Many thanks :)