Derivation of Mode of grouped data

Question

A formula to calculate the mode for grouped data's is given in my text book:

Mode = $l + \dfrac{(f_1 - f_0)h}{2f_1 - f_0 - f_2} $

Where, $l = $ lower limit of the modal class,

$h = $ size of the class interval,

$f_1 = $ frequency of the modal class,

$f_0 = $ frequency of the class preceding the modal class,

$f_2 =$ frequency of the class succeeding the modal class.

Can you please explain the derivation of this formula, as it is not given in my T.B. Thanks .

This fig might be useful to visualize it: images.app.goo.gl/ctRLnzbPSDuwDPyF8 — Shub, Commented Aug 23, 2020 at 6:41

David K · Accepted Answer · 2014-08-21 18:41:57Z

The following is not a rigorous derivation (a derivation would require a lot of assumptions about what makes one estimator better than another), but is an attempt to "make sense" of the formula so that you can more easily remember and use it.

Consider a bar graph with a bar for each of the classes of data. Then $f_1$ is the height of the bar of the modal class, $f_0$ is the height of the bar on the left of it, and $f_2$ is the height of the bar on the right of it.

The quantity $f_1 - f_0$ measures how far the modal class's bar "sticks up" above the bar on its left. The quantity $f_1 - f_2$ measures how far the modal class's bar "sticks up" above the bar on its right.

Now, observe that $$ \frac{f_1 - f_0}{2f_1 - f_0 - f_2} + \frac{f_1 - f_2}{2f_1 - f_0 - f_2} = \frac{f_1 - f_0}{(f_1 - f_0) + (f_1 - f_2)} + \frac{f_1 - f_2}{(f_1 - f_0) + (f_1 - f_2)} = 1 $$ So if we want to divide an interval of width $h$ into two pieces, where the ratio of sizes of those two pieces is $(f_1 - f_0) : (f_1 - f_2)$, the first piece will have width $\frac{f_1 - f_0}{2f_1 - f_0 - f_2} h$.

This is what the formula for estimating the mode does. It splits the width of the modal bar into two pieces whose ratio of widths is $(f_1 - f_0) : (f_1 - f_2)$, and it says the mode is at the line separating those two pieces, that is, at a distance $\frac{f_1 - f_0}{2f_1 - f_0 - f_2} h$ from the left edge of that bar, $l$.

If $f_1 - f_0 = f_1 - f_2,$ that is, the modal bar is equally far above the bars on both its left and right, then this formula estimates the mode right in the middle of the modal class: $$ l + \frac{f_1 - f_0}{2f_1 - f_0 - f_2} h = l + \frac12 h. $$ But if height of the bar on the left is closer to the modal bar's height, then the estimated mode is to the left of the centerline of the modal class. In the extreme case where the bar on the left is exactly the height of the modal bar, and both are taller than the bar on the right, that is, when $f_1 - f_0 = 0$ but $f_1 - f_2 > 0$, the formula estimates the mode at $l$ exactly, that is, at the left edge of the modal bar. In the other extreme case, where the bar on the left is shorter but the bar on the right is the same height as the modal bar ($f_1 - f_0 > 0$ but $f_1 - f_2 = 0$), the formula estimates the mode at $l + h$, that is, at the right edge of the modal bar.

why we say that the separating line is mode value? And how we know that this line is best approximation for mode? — Vicrobot, Commented Aug 27, 2018 at 5:09
@Akalanka Did you read the first paragraph? Nothing is proved here. I suppose there are some implicit assumptions about the values given in the question, such as $f_0\leq f_1\geq f_2,$ and the answer uses typical definitions of class intervals and frequencies that are not stated here. But why is it better to split the interval in the ratio $(f_1 - f_0) : (f_1 - f_2)$ than to just take the midpoint? Why not $(f_1 - f_0)^2 : (f_1 - f_2)^2$ or some other ratio? I don't know. — David K, Commented Jul 28, 2020 at 10:30
@DavidK Can you please tell me where can I find the deviation of this formula because I wanted to know I searched everywhere but I failed, simply I wanted to know what are the used assumption before proved this because if there is simple ungroup like $2,2,2,,3,3,4,4,5,6,7$ then its mode is 2 then I grouped it like $0-2,3-5,6-8$ then Its mode group becomes $3-5$ and I used Mode = $l +$ $ (f_1 - f_0) \over (2f_1 - f_0 - f_2) $ $ \ h $ formula to find the mode but totally different these actual mode and grouped mode. This is simple one and this is what I am thinking. — DARK, Commented Jul 28, 2020 at 13:21
Your example shows that when you put data into bins that are too large, too few, and poorly chosen, you can distort the shape of a distribution and make it impossible to find the true mode. There is no formula, derivation, or set of assumptions that can save you after you do this. You should think instead, why am I binning the data and what are good bins? Suppose instead of all integers you had the data $2,2.0001,2.0002,3.001,3.02,4.01,4.01,5.05,6.002,7$. Now the only value that is repeated exactly is $4.01.$ So the data are almost the same but the mode is twice as much as before. Or is it? — David K, Commented Jul 29, 2020 at 12:51

Sooraj S · Accepted Answer · 2020-05-25 00:49:50Z

6

we partition the continuous frequency distrbution into intervals. The maximum value is within the modal class. It is assumed that the rate of change of the frequency on both sides of the mode(max. frequency) are equal.

$$ \text{slope, }m_{AB}=-m_{BC}\\ \tan(90-b)=-\tan(90+b)\implies \tan a=\tan b\\ \frac{x}{f_1-f_0}=\frac{h-x}{f_1-f_2}\implies x(f_1-f_2)=h(f_1-f_0)-x(f_1-f_0)\\ x(2f_1-f_0-f_2)=h(f_1-f_0)\implies x=\frac{f_1-f_0}{2f_1-f_0-f_2}.h\\ \text{Mode}=l+x=l+\frac{f_1-f_0}{2f_1-f_0-f_2}.h $$

edited May 25, 2020 at 0:49

answered May 24, 2020 at 22:40

Sooraj S

7,6744 gold badges49 silver badges91 bronze badges

Add a comment |

Stack Exchange Network

Derivation of Mode of grouped data

2 Answers 2

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
statistics
.

Linked

Hot Network Questions

Derivation of Mode of grouped data

2 Answers 2

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged statistics.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
statistics
.