How to understand the definition of empirical distribution function

Question

I am reading the All of Nonparametric Statistics, by Larry Wasserman. At page 12, he defines the empirical distribution function as:

The empirical distribution function $\hat{F_n}$ is the CDF that puts mass $\frac{1}{n}$ at each data point $X_i$. Formally,

$$\hat{F_n}(x)=\frac{1}{n}\sum^{n}_{i=1}I(X_i\le x)$$

where

$$I(X_i\le x)=\left\{\begin{matrix} 1& if\ X_i \le x\\ 0 & if \ X_i>x \end{matrix}\right.$$

My questions are:

Why is $\frac{1}{n}$ called mass?
The CDF puts mass $\frac{1}{n}$ to each data point $X_i$, then, by my understanding, it should be $\frac{1}{n}X_1+\frac{1}{n}X_2+...+\frac{1}{n}X_n$.

Why is it $\hat{F_n}(x)=\frac{1}{n}\sum^{n}_{i=1}I(X_i\le x)$? I think this formula puts mass $\frac{1}{n}$ on each indicator function $I(X_i \le x)$ but not $X_i$.

What is the meaning of "puts" something "at each data point"?

Glen_b · Accepted Answer · 2018-03-27 22:59:57Z

Why is $\frac{1}{n}$ called mass?

The term "mass" refers to an amount of probability at a single discrete point, as distinct from "density" in relation to continuous distributions.

The CDF puts mass $\frac{1}{n}$ to each data point $X_i$, then, by my understanding, it should be $\frac{1}{n}X_1+\frac{1}{n}X_2+...+\frac{1}{n}X_n$.

That isn't a question, it's a statement -- but your understanding given there is mistaken in a couple of ways at once, so I can discuss that.

First the expression $\frac{1}{n}X_1+\frac{1}{n}X_2+...+\frac{1}{n}X_n$ is actually an expression for the sample mean (as a random variable) -- it literally means to average the values. I presume that you meant instead to write an expression for the empirical probability function here -- but keep in mind that we're meant to be dealing with a distribution function, not the probability function, so you need to find the proportion of the empirical probability that's at or to the left of each possible value of $x$ -- that's how a distribution function represents probability 1/n at each point:

These are two different representations of the same underlying object. You can see that the empirical pmf shows a mass of 1/n at each observed value, while the ecdf shows a height that increases by 1/n at each observed value (and that this corresponds to 1/n times the sum of indicator functions you mentioned)

What is the meaning of "puts" something "at each data point"?

I'm not quite sure exactly what causes the difficulty here, the words essentially take their ordinary meanings; see the images above which show a proportion of $1/n$ at each observed value $x_i$; if you treat the epmf and the ecdf as a pmf and a cdf respectively, those are probabilities. Possibly it is treating $\hat{F}$ as an active entity (one that can "put" things somewhere) that is confusing you -- would it be easier to understand if it said "has" rather than "puts"? If that doesn't help, you'll have to make it clearer what you need explained there.

Stack Exchange Network

How to understand the definition of empirical distribution function

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
distributions
nonparametric
or ask your own question.

Linked

Hot Network Questions

How to understand the definition of empirical distribution function

1 Answer 1

Not the answer you're looking for? Browse other questions tagged distributionsnonparametric or ask your own question.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
distributions
nonparametric
or ask your own question.