2
$\begingroup$

Let $X\sim\text{Normal}(0,1)$ and let $f_X$ be its probability density function. I conducted some numerical experiments in the software Mathematica to estimate $f_X$ via a kernel method. Let $\hat{f}_X^M$ be the kernel density estimate using a sample of length $M$. Let $$\epsilon=E\left[\|f_X-\hat{f}_X^M\|_\infty\right]$$ be the error ($E$ is the expectation). I noticed that the error decreases with $M$ until a certain length $M_0$ from which the error stabilizes. For example, in Mathematica, the built-in function SmoothKernelDistribution employs the Gaussian kernel with Silverman's rule to determine the bandwidth by default. In the following figure in log-log scale, I show the error $\epsilon$ for different values of $M$ growing geometrically, where the expectation that defines $\epsilon$ is estimated using 20 realizations of $\|f_X-\hat{f}_X^M\|_\infty$. I also plot the estimated $90\%$ confidence interval for $\|f_X-\hat{f}_X^M\|_\infty$ (dashed lines).

enter image description here

Observe that the error decreases linearly in log-log scale (that is, at rate $O(M^{-r})$), up to a certain length $M$ where it starts to stabilize. Also, the confidence intervals become more narrow in the end. Is this issue due to accumulated numerical errors? Is it due to Silverman's rule?

$\endgroup$
0

1 Answer 1

2
$\begingroup$

I think the problem is the built-in function of Mathematica, SmoothKernelDistribution (its interpretation of Silverman's selection). If you implement the estimator $$ \hat{f}_X^M(x)=\frac{1}{Mh}\sum_{i=1}^M K\left(\frac{x-x_i}{h}\right) $$ yourself, where $x_1,\ldots,x_M$ is the data, the kernel $K$ is the density function of a $\text{Normal}(0,1)$ distribution, and the bandwidth $h$ is $1.06\,\hat{\sigma}_X\,M^{-1/5}$, then the error tends to zero as $M$ grows with no problem:

enter image description here

The bandwidth selected by SmoothKernelDistribution seems to be too large when $M$ gets bigger, which implies a biased estimate (error stabilization) with small variance (more narrow confidence interval).

Any other response will be welcomed.

$\endgroup$
3
  • $\begingroup$ This answer is plausible, +1. Consider asking the good people at the SE Mathematica site about this, too, to identify the bug more precisely. $\endgroup$
    – whuber
    Commented Jan 12, 2020 at 20:29
  • 1
    $\begingroup$ It would be helpful to show your Mathematica code as you seem to be saying Mathematica does it wrong. Also, a response is given at mathematica.stackexchange.com/questions/212764/…. $\endgroup$
    – JimB
    Commented Jan 13, 2020 at 0:41
  • $\begingroup$ Well, Mathematica does seem to do it wrong for sample sizes greater than 100,000. See answer at mathematica.stackexchange.com/questions/212764/…. $\endgroup$
    – JimB
    Commented Jan 13, 2020 at 17:08

Not the answer you're looking for? Browse other questions tagged or ask your own question.