6
$\begingroup$

I have a large list of data (3.2 million real numbers), and I would like to plot a histogram of it. The built-in Histogram function is very nice, but on my computer, it is often extremely slow when trying to chart histograms of lists that are very long (~1 million real numbers).

So, I would like to pre-bin the data, put it into {x, y} form (i.e., a list of ordered pairs), and plot it with ListPlot -- with the hope that this will be a workaround to using Histogram[list, PerformanceGoal -> "Speed"] directly.

The BinCounts function is very nice: it takes a list, followed by a bin specification, and outputs the number of elements found within each bin. For example, consider one of the examples given in the documentation:

BinCounts[{1, 3, 2, 1, 4, 5, 6, 2}, {0, 10, 1}]
(* {0, 2, 2, 1, 1, 1, 1, 0, 0, 0} *)

where the bin specification $\{x_\min, x_\max, \text{dx}\}$ tells Mathematica to use bins which satisfy the relation $${x_\min + (i-1) \text{ dx} \leq x < x_\min + i \text{ dx}}$$ for bin $i$.

But, while BinCounts efficiently and effectively outputs the "y" values (the counts), it does not output the "x" values (the bin positions). This is probably the case because there is some ambiguity in the term "bin position," especially for lists containing a small number of elements. But, for a list of many elements, the term "bin position" becomes less important, I think.

Is there any way to automatically print both the "x" and the "y" values for a "histogram" to be plotted using ListPlot? Or should I write my own function? I can write my own function, but I just wanted to ask, because it seems somewhat odd that there does not seem to be a way to use Histogram to simply output the data (and suppress display of the fancy, time- and memory-consuming chart graphic).

As far as what to use as the working "bin position," I guess that I would like to use the midpoint of the bounds of each bin. I guess this would be $$\frac{(x_\min + (i-1) \text{ dx}) + (x_\min + i \text{ dx})}{2} = \frac{1}{2}(2x_\min + (2 i - 1) \text{ dx})$$.

$\endgroup$

2 Answers 2

7
$\begingroup$

You need the HistogramList function, which gives you both the bins (x values) and the heights (y values). For example (using your data and bin specification):

{bins, counts} = HistogramList[{1, 3, 2, 1, 4, 5, 6, 2}, {0, 10, 1}]
(* {{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10}, {0, 2, 2, 1, 1, 1, 1, 0, 0, 0}} *)

Note that the lengths of bins and counts are not the same:

Length /@ {bins, counts}
(* {11, 10} *)

To use the above bins and counts with ListLinePlot or an equivalent function, use the various approaches in this question and its answers to pair up the two lists. Using the highest upvoted answer:

ListLinePlot[{bins, Append[counts, 0]} // Transpose, InterpolationOrder -> 0]

enter image description here

$\endgroup$
2
  • $\begingroup$ Ah, I see that HistogramList is new in version 8. I have both version 7 and 8 on my computer, but I typically use version 7 because some of my colleagues do not have 8 and I want to avoid using functions that they don't have. But, based on this, it looks like it's time to upgrade! Thanks for your time. $\endgroup$
    – Andrew
    Commented Jul 9, 2012 at 18:50
  • $\begingroup$ @Andrew I've added an equivalent function for version 6 and above in an answer. $\endgroup$
    – rm -rf
    Commented Jul 9, 2012 at 19:06
5
$\begingroup$

As you rightly note in your comment, HistogramList is new in version 8. However, BinCounts has been around since version 6, and so here is an equivalent function that works from version 6 onwards.

histogramList[data_, binspec_List] := {Range[Sequence @@ binspec], BinCounts[data, binspec]}

You can verify that this indeed returns the same results as the version 8 equivalent above:

data = {1, 3, 2, 1, 4, 5, 6, 2};
HistogramList[data, {0, 10, 1}] == histogramList[data, {0, 10, 1}]
(* True *)
$\endgroup$
4
  • 2
    $\begingroup$ Methinks BinCounts is much faster (x10) at the expense of precalculating appropriate binspecs $\endgroup$ Commented Jul 9, 2012 at 19:12
  • $\begingroup$ @belisarius This function is about twice as fast as the built-in HistogramList on my machine $\endgroup$
    – rm -rf
    Commented Jul 9, 2012 at 19:16
  • $\begingroup$ @belisarius Wait, so this means that R.M's histogramList is actually faster than Mathematica's HistogramList? $\endgroup$
    – Andrew
    Commented Jul 9, 2012 at 19:36
  • 2
    $\begingroup$ @Andrew I would rather say that BinCounts[] is faster than HistogramList[] ... $\endgroup$ Commented Jul 9, 2012 at 19:40

Not the answer you're looking for? Browse other questions tagged or ask your own question.