45
$\begingroup$

I recently discovered Benford's Law. I find it very fascinating. I'm wondering what are some of the real life uses of Benford's law. Specific examples would be great.

$\endgroup$
3

10 Answers 10

43
$\begingroup$

Some of the data generated by Math Stack Exchange itself (and, presumably, by similar sites) ought to approximately follow Benford's law. These would include the distribution of first digits of

  1. Frequency of tag use,
  2. Number of votes for questions,
  3. User reputation,
  4. Number of views for questions.

This is because Benford's law applies to exponentially growing quantities, and the total number of all of these quantities ought to be growing exponentially. It's only approximate because of artifacts in the way that some of these quantities are determined and because you need several orders of magnitude in order to see Benford's law really kick in.

Anyway, I decided to test this for the first three. The first digits weren't actually that hard to compile because the site allows you to sort the first three from highest to lowest. (Unfortunately, it does not allow you to sort the fourth that way, and I don't feel like wading through 2200+ questions to collect the data.) Here are the results I got on the first three.

Tag Use

(Data collected October 25, 2010.)

alt text

Not a bad fit, especially when you consider that there are only three orders of magnitude represented in this measure. There are a disproportionately large number of tags that were created but only used once, which explains the larger frequency of 1 as a first digit.

Votes Per Question

(Data collected October 26, 2010.)

alt text

Also not a bad fit, especially since there are currently only two orders of magnitude represented in this measure. Also, I threw out the questions with 0 and negative numbers of votes.

User Reputations

(Data collected October 25 and 26, 2010.)

alt text

This is the worst fit of the three, as the frequency of 1 as a first digit is so much larger than the others. However, there are a very large number of users who have never posted a question or an answer and so have a rep of 1. And, for reasons unknown to me, there are also a large number of users who have a reputation of 101, despite never having asked or answered a question.

If you remove 1 as a possible first digit and then rescale the Benford law probabilities to consider only 2 through 9 as possible first digits then the picture looks like the following, which is a much better Benford fit.

alt text

Admittedly, none of these data sets would pass Pearson's chi-square test for goodness-of-fit with respect to the Benford probabilities at a reasonable level of significance. However, given some of the artifacts in the data and the fact that there are relatively few orders of magnitude represented, the fit with Benford's law is really not that bad.

$\endgroup$
3
  • 3
    $\begingroup$ whoa that's interesting! Thank you for taking the time for putting this together. $\endgroup$
    – Jin
    Commented Oct 27, 2010 at 20:25
  • $\begingroup$ @Jin: It actually didn't take that long. Also, I may be able to use it as an example of Benford's law when I teach statistics next semester. $\endgroup$ Commented Oct 27, 2010 at 20:27
  • 13
    $\begingroup$ Great work! To answer you question regarding the 101-rep users: If you have an account on any other StackExchange site with at least 200 rep, you get a 100 rep bonus when creating a new account on this (or any) site. So 101 = 1 (start value) + 100 (association bonus) $\endgroup$
    – balpha
    Commented Oct 27, 2010 at 20:37
28
$\begingroup$

Forensic accountancy is a popular use, and is actually admissible as evidence in the USA.

$\endgroup$
13
$\begingroup$

Here's an article from about a year ago in the German Economic Review, in which the authors use Benford's law to analyze economic data among countries using the euro. They find that Greece's economic data just before joining the euro differed significantly from the Benford's law predictions. The implication is that Greece may have manipulated their numbers in order to comply with the terms of the Maastricht Treaty (which sets criteria for full membership in the European Monetary Union). See, for instance, this blog post. I find this particularly interesting in light of Greece's recent financial troubles.

Here's the abstract from the article: "To detect manipulations or fraud in accounting data, auditors have successfully used Benford's law as part of their fraud detection processes. Benford's law proposes a distribution for first digits of numbers in naturally occurring data. Government accounting and statistics are similar in nature to financial accounting. In the European Union (EU), there is pressure to comply with the Stability and Growth Pact criteria. Therefore, like firms, governments might try to make their economic situation seem better. In this paper, we use a Benford test to investigate the quality of macroeconomic data relevant to the deficit criteria reported to Eurostat by the EU member states. We find that the data reported by Greece shows the greatest deviation from Benford's law among all euro states." (emphasis mine)

See also this question on quant.SE (where I first learned of this).

$\endgroup$
6
$\begingroup$

It may or may not be useful for detecting fraud in elections, for example the 2009 Iranian election.

Some links:

http://sbseminar.wordpress.com/2009/06/15/benfords-law-and-the-iranian-election

http://www.fivethirtyeight.com/2009/06/statistical-evidence-does-not-prove.html

$\endgroup$
5
$\begingroup$

You know those numbers on houses that tell you the address? Like, say, 1205 Main Street, Sometown USA. If you run a store that sells those numbers, you should probably stock more 1s and 2s than 8s and 9s. Building contractor supply stores probably know about Benford's law this way. Hardware stores (like ACE, TruValue, etc.) probably don't. In fact, the next time you go to the hardware store, try telling them you need to replace your house numbers. Tell them you live on 1129 some street.

$\endgroup$
4
$\begingroup$

I recently attended a lecture on this. So I'll list a few interesting points that I remember.

The original discovery was by Newcomb who noticed that tables of logarithms were more worn out in the front than the back. It was rediscovered by Benford in 1938.

Areas of countries, areas of rivers and sizes of populations obey this rule. A more surprising example claims the numbers that appear in newspapers also follow this rule. In spite of the many observances of Benford's law, not a lot of instances are fully explained. I believe there have been proposals to provide simple models to associate an 'exponential-like' behavior of the areas of countries, but I don't really know anything about them.

Another note on the forensic accounting example mentioned by workmad3, I remember the lecturer pointed out that evidence pertaining to Benford's law is considered admissible in court.

There are things that don't really obey Benford's law, for example: the number of pages in books.

and that's more or less what I can recall from the talk, hope that helps :)

$\endgroup$
3
$\begingroup$

From a physics perspective you can look at how Benford's law (or devations from it) can arise from the different stastical mechanics distributions (Boltzmann, Bose-Einstein, etc...). There is a good paper on the arVix that can get you started on this topic:

Abstract:

The occurrence of the nonzero leftmost digit, i.e., 1, 2, ..., 9, of numbers from many real world sources is not uniformly distributed as one might naively expect, but instead, the nature favors smaller ones according to a logarithmic distribution, named Benford's law. We investigate three kinds of widely used physical statistics, i.e., the Boltzmann-Gibbs (BG) distribution, the Fermi-Dirac (FD) distribution, and the Bose-Einstein (BE) distribution, and find that the BG and FD distributions both fluctuate slightly in a periodic manner around the Benford distribution with respect to the temperature of the system, while the BE distribution conforms to it exactly whatever the temperature is. Thus the Benford's law seems to present a general pattern for physical statistics and might be even more fundamental and profound in nature. Furthermore, various elegant properties of Benford's law, especially the mantissa distribution of data sets, are discussed.

The Significant Digit Law in Statistical Physics

$\endgroup$
2
$\begingroup$

I've heard of it being as a rough check to see if accounting numbers were being made up

$\endgroup$
1
  • 1
    $\begingroup$ The actual process of using Benford's Law consists of checking the distribution of the numbers, if the numbers are tampered with they show a diffferent distribution. People inventing numbers do this in a biased way, different from how numbers show up in real life. $\endgroup$
    – Pieter
    Commented Jul 20, 2010 at 19:49
2
$\begingroup$

I think that Benford's law is quite intuitive in "real life", but it would indeed be more suprising if it would often hold in mathematics, and in fact it does, here is a nice related student paper.

$\endgroup$
1
$\begingroup$

There is a negative application for databases. Often record numbers in a database are assigned sequentially.

Another common thing is to want to partition records into multiple databases serving as one big database. If you don't know what the maximum value would be, you might be tempted to divide the records between databases based upon string comparison of the sequentially-assigned record number starting with the first (non-zero) digit of the sequence number for example, all starting with "1" goes into database 1, starting with "2" goes into database 2, etc. (and you can use string ranges to perform division into more or less than 9 ranges in a similar fasion) only to discover that your first database (partition) for record numbers starting with "1" now has on average 5 or 6 times as much data in it as your last database (partition) more or less, so you don't achieve the balance between partitions you wanted (even if you were dividing into more or less than 9 partitions with apparently-evenly-divided digit ranges).

There are other ways of partitioning based upon record number to disable Benford's law, such as comparing digits in reverse order, i.e. all record numbers that end in "1" go in database 1, etc.

(FWIW, in a prior post, I cannot understand why the number of pages in books (the length of books) would not follow Benford's law just like the lengths of rivers. Maybe someone was thinking of numbers in a telephone book.)

$\endgroup$

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .