0
$\begingroup$

Suppose that $U$ is a uniformly distributed (continuous) random variable on $[0,1]$. Let's say that I am interested in finding 3 discrete points $u_1,u_2,u_3$ which approximate $U$ in some sense. My intuition tells me that to obtain these points, I would first partition the interval into 3 equal parts and select $u_1,u_2,u_3$ as the midpoint of these parts. In other words, $u_1 = 1/6, u_2 = 1/2, u_3 = 5/6$ constitute an approximation to $U$. (Each point would then have a probability of 1/3).

Upon discussing with someone, I was asked why $u_1 = 0, u_2 = 0.5, u_3 = 1$ (with equal probabilities) cannot constitute such an approximation. My initial response is that it contains the endpoints however I cannot rigorously quantify why these set of points is not "uniformly distributed".

Any thoughts to clarify this issue?

$\endgroup$
3

1 Answer 1

1
$\begingroup$

It looks like you want to design a quantizer. A good textbook on this is "Vector Quantization and Signal Compression" by Gersho and Grey.

One way to design such a mapping is to define a function (called a quantizer) $g(x)$ (which gives the approximated value) and then minimize some average of a loss function $L(x,g(x))$ (i.e. minimize $E[L(X,g(X))]$). Usually these loss functions are of the form $L(x-g(x))$. A common paradigm is to minimize $E[(X-g(x))^2]$ (square loss/mean square error). Choosing different different loss functions will give you different choices of quantizers (and different loss functions are good for different applications).

If we consider our quantizer to map intervals to the same value, we would have $[0,a) \to u_1$, $[a,b) \to u_2$ and $[b,1] \to u_3$. Then, if we wanted to use the minimum mean square error, we could write down that integral as $\int_0^a (x-u_1)^2 dx + \int_a^b (x-u_2)^2 dx + \int_b^1 (x-u_3)^2 dx$ and minimize it subject to $u_1 \in [0,a)$, $u_2 \in [a,b)$, $u_3 \in [b,1]$. We can set $a,b$ as design parameters or optimize over them. It turns out that the optimal quantizer will be the centroid of the region, i.e. $u_1 = E[X | X \in [0,a)], u_2 = E[X| X \in [a,b)],u_3=E[X|X\in [b,1]]$ in this case.

$\endgroup$

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .