4
$\begingroup$

I have a table consisting of a number of whole percentages $x_i$ between $0\%$ and $100\%$. However, they don't add up to $100\%$ (rather they add up to $101\%$). But they 'should'.

Assuming that any given percentage $x_i\%$ is rounded from some precise (unrounded) $y_i\%$ which is really uniformly distributed according to $y_i\sim \text{UNIF}(\max{(x_i-0.5\%,0\%)},\min(x_i+0.5\%,100\%))$, what is the best way to go about computing some estimates for unrounded $y_i$s, i.e. $\text{E}(y_i)$? I prefer expectation (over MLE).

NB: The table does contain some $x_i=0\%$ entries.

NB 2: The $\text{E}(y_i)$s that I am looking for are reals, not integers.

NB 3: Simply scaling doesn't work. For example take $10\%$, $80\%$ and $11\%$. Total $101\%$. Just scaling those down would obviously yield $100\%$. But $80\%\cdot \frac{100\%}{101\%}=79.2079\ldots\%$ will now round to $79\%$ instead of $80\%$. So, this cannot be right.

NB 4: Distributing the error equally over all entries has a drawback too. If the error would be $−1\%$, some such $\text{E}(y_i)$s (e.g. those belonging to $x_i=0$) could become less than zero. That indicates that that procedure cannot be right either.

$\endgroup$
14
  • $\begingroup$ The $0$% entries would end up with the greatest amount of relative change (i.e., $\infty$), and are thus the ones that are most affected by having rounding applied to them. Is there any system that allows this as a reasonable outcome? $\endgroup$
    – abiessu
    Commented May 13, 2014 at 19:46
  • $\begingroup$ "Should" according to whom? Are you just fudging your data to assuage the stupid people in your audience who don't understand rounding? $\endgroup$ Commented May 13, 2014 at 19:51
  • 2
    $\begingroup$ If you ask three persons for some opinion, you might get $33\,\%$ for eac of the answers "yes", "no", "maybe". And now you want to forcefully break the tie by claiminng that one of the answers really got $34\,\%$? $\endgroup$ Commented May 13, 2014 at 19:53
  • 1
    $\begingroup$ @HagenvonEitzen No, the $y$s are precise. In that example they would all be $100/3\%$. $\endgroup$
    – Řídící
    Commented May 13, 2014 at 19:55
  • 1
    $\begingroup$ @EmmadKareem Because if the error would be $-1\%$, some $\text{E}(y_i)$s (e.g. those belonging to $x_i=0$) could become less than zero. That indicates that that procedure cannot be right. $\endgroup$
    – Řídící
    Commented May 13, 2014 at 21:39

3 Answers 3

1
$\begingroup$

You can't compute unrounded $y$s because you have thrown away the information by rounding. What you can do is to alter the $x$s so they add to $100\%$. You will then violate the fact that the $x$s are the rounded values closest to the $y$'s. If they currently add to $101\%$, you just need to decrease one of them by $1\%$. If you have the $y$s available, you can choose the one that is closest above $zz.50\%$, which seems the logical one to flip-it makes the least error. If you don't have the $y$s available, I would decrease the largest $x$, just because it introduces the least fractional error. But you might as well pick one at random.

$\endgroup$
11
  • $\begingroup$ Rounding the $y$s should lead, at least, to the given $x$s. $\endgroup$
    – Řídící
    Commented May 13, 2014 at 19:51
  • 1
    $\begingroup$ You can't have both that and have the $x$s add to $100\%$ all the time. What would you do with $33.6, 33.6, 33.8$? These round to $34,34,34$, which add to $102$. That is why many tables have a footnote that the percentages may not add to $100$ due to rounding. $\endgroup$ Commented May 13, 2014 at 19:54
  • 2
    $\begingroup$ I only want to reconstruct the ('expected') $y$s. Obviously, every $y$ needs to round to its $x$. Also, obviously, because the $y$s are exact, they add up to $100\%$. Why am I failing to bring this point across? $\endgroup$
    – Řídící
    Commented May 13, 2014 at 20:07
  • 1
    $\begingroup$ Yes, but I think that in some cases that $\text{E}(y_i)$ won't round to $x_i$ anymore. That's why I came up with the explicit prior distribution. $\endgroup$
    – Řídící
    Commented May 13, 2014 at 20:59
  • 1
    $\begingroup$ For example take $10\%$, $80\%$ and $11\%$. Total $101\%$. Just scaling those down would obviously yield $100\%$. But $80\%\cdot 100/101=79.2079\ldots$ will now round to $79\%$ instead of $80\%$. So, this cannot be right. $\endgroup$
    – Řídící
    Commented May 13, 2014 at 21:11
1
$\begingroup$

If you have the unrounded $y_i$ available, you may use algorithms used in elections (to compute party seats from vote counts), such as d'Hondt or Hare-Niemeyer. Essentialy, most methods boil down to finding real numbers $a,b$ such that an assignment of $x_i=\lfloor ay_i+b\rfloor $ yields $\sum x_i=100$. Standard rounding sets $b=\frac12$ and $a=\frac{100}{\sum y_i}$ and often fails to meet the goal; other methods stay with $b=\frac12$ but adjust $a$ until the goal is met; or stay with $a=\frac{100}{\sum y_i}$ and adjust $b$; or play with both. Even the best methods will fail in an actual tie. And the choice between methods will influence the outcome in some direction (in election systems: prefer either small or big parties; accordingly with your application), so should be make with care and documented.

Note that after doing any such "correction", the reader of your table may feel satisfied that all adds up to $100\,\%$, but he can never be sure that a value shown as $50\,\%$ is really between $49.5\,\%$ and $50.5\,\%$ - it may certainly lie outsude that interval and have been "forced" to the displayed numbr.

$\endgroup$
1
  • $\begingroup$ I don't have the unrounded $y$s available. I want to estimate them. And this is not for election purposes. I only want a reasonable (actually: the most reasonable) unrounded table that rounds to the original one and adds up to $100\%$. Is my question that unclear, I worry. $\endgroup$
    – Řídící
    Commented May 13, 2014 at 20:15
1
$\begingroup$

I think I get what you're trying to do. Given $(x_1,\ldots,x_n)$, you have bounds $l_i=\max(x_i−\frac12,0)$ and $u_i=\min(x_i+\frac12,100)$, and you're considering the probability space $[l_1,u_1]\times\cdots\times[l_n,u_n]$ under the additional condition that $y_1+\cdots+y_n=100$. Assuming said space is nonempty, you assign uniform probability density to it and you want to find the expectation of $(y_1,\ldots,y_n)$.

This probability space is the intersection of a hypercube and a hyperplane, which forms an $(n-1)$-dimensional polytope, and the expectation is its centroid. I don't know how you would actually compute this in general. But in your problem you almost always have symmetry, and the right solution is just to assign the same absolute change to each entry. Got $101$ instead of $100$? Subtract $1/n$ from all $x_i$. This only fails when one or more of the values are at $0$ or $100$ already.

$\endgroup$
4
  • $\begingroup$ Thank you. The first paragraph, I think, captures my problem. However, the problem is that the symmetry (as mentioned in your second paragraph) applies only to numbers from $1$ through $99$. $0$ and $100$ are different. And in the tables that I have to work with, there are $0$s (as mentioned in the question). :( $\endgroup$
    – Řídící
    Commented May 13, 2014 at 21:34
  • $\begingroup$ Right, so then what you really want is the centroid of the intersection of a hypercube and a hyperplane. As I said, I don't know how to compute that, but maybe if you edit your question to include this characterization of the problem, you might attract the attention of someone who does. $\endgroup$
    – user856
    Commented May 14, 2014 at 2:09
  • $\begingroup$ I could do that, but I must admit that I'm not very familiar with centroids, hypercubes and hyperplanes. That way it might become obvious that I don't know much at all. :) I thought that the question as such must have been thought of by many people. For example those making graphs and pie charts and such. $\endgroup$
    – Řídící
    Commented May 14, 2014 at 9:44
  • $\begingroup$ @Keepthesemind: It dividing the discrepancy among the entries would cause some to go below $0$ or above $1$, then we don't touch those entries and divide among the other entries instead. My reason is that we want to preserve the relative order of values so it's best to move them in the same general direction. $\endgroup$
    – user21820
    Commented Aug 5, 2016 at 9:33

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .