1
$\begingroup$

My understanding of histogram density estimation: For $k$ predefined equal-width bins $(b_0, b_1], (b_1, b_2], ..., (b_{k-1}, b_k]$ and $n$ observations $x_1,...,x_n \in (b_0,b_k]$, we estimate density as $f(x) = \frac{1}{b_1-b_0} \sum_{i=1}^k P_k 1_{x\in (b_{i-1},b_i]}$, where $P_k$ is the proportion of observations falling in the $k^{th}$ bin.

This seems to me like parametric density estimation, with a fixed number $k$ of parameters $P_1,...,P_k$ (or just the first $k-1$), which does not grow with $n$. However, here and on other websites, I see histogram density estimation referred to as non-parametric. I've seen several definitions of "nonparametric" - which of those would include this method?

$\endgroup$
1
  • 4
    $\begingroup$ Contemplate what happens as the amount of data grows arbitrarily large. If you keep to those fixed bin cutpoints (they don't have to be equal width, btw), then your estimator starts behaving poorly. People therefore usually think in terms of adjusting the numbers and values of the cutpoints according to how much data are available. That cannot be described by a finite number of parameters. $\endgroup$
    – whuber
    Commented Sep 15, 2023 at 16:43

0

Browse other questions tagged or ask your own question.