6
$\begingroup$

In the following online article

http://www.star.le.ac.uk/~sav2/stats/a.html

I see the word "bin" used, in relation to x-ray spectroscopy, both as a verb and as a noun (people both "bin" things and talk about each "bin"). I've also heard it batted about in other places. There seems to be a very good explanation of why we would "bin", but the article doesn't seem to say what this process actually means. On searching up the term I see all sorts of definitions, but mostly to do with Photometrics.

Rather than quote the areas of the article which I don't understand, I simply ask that given the context of spectroscopy, what "bin" means?

$\endgroup$
1

3 Answers 3

23
$\begingroup$

Suppose you are analysing the weights of people in the UK to see what the distribution of weights looks like. Suppose also you can measure the weight to arbitrary precision, so that no two people's weights will be exactly the same. When you're finished you plot your data on a histogram, but the trouble is that because everyone has a different weight you get a histogram that looks like (the data is entirely fictional):

No binning

And this is no use to anyone. You can see there's some clustering around the average weight, but it's impossible to get any detail from the graph.

Now suppose you choose groups of weights e.g. 50-55 kg, 55-60 kg, 60-65 kg, etc, and now you count the number of people whose weights fall into each group. This time your histogram is going to look like:

With binning

and you can see the good old bell shape emerging. So you can see the average and the width of the distribution. The groups are called bins, and the process of assigning each data point to a bin is called binning.

You choose the bin size to best suit your data. If you make the bins small you get lots of points on your histogram but you'll have lots of statistical noise. Make the bins too big and you get excellent signal to noise but too few points on your histogram to be useful.

I've used weights because that's a particularly simple example, but exactly the same applies to measuring spectra. Each bin would be a range of wavelengths, and you'd measure the integrated intensity for the range. You choose the bin size to make the signal to noise as good as possible while keeping the spectral resolution within acceptable limits.

$\endgroup$
2
  • 1
    $\begingroup$ Have a unicorn dollar... $\endgroup$
    – Floris
    Commented Oct 1, 2015 at 13:15
  • 3
    $\begingroup$ The term 'bin' to mean 'a place to throw things into', which might help for people who don't speak English as a first language. maidstone.gov.uk/__data/assets/image/0014/5153/… Imagine a wall of bins, and you toss your data points into the bin nearest to where it actually is, just to organize it a bit. As an aside, cumulative graphs display similar information (in a different way) without requiring binning (steepness of cdf corresponds to height of histogram) $\endgroup$
    – Yakk
    Commented Oct 1, 2015 at 15:10
6
$\begingroup$

"Bin" as a verb means to divide/discretize data into a group of (frequently equal-width) ranges, to facilitate various sorts of analysis and visualization. See: https://en.wikipedia.org/wiki/Data_binning

In particular, binning is the basis of histogram plots among other things. (https://en.wikipedia.org/wiki/Histogram)

As a noun, a "bin" refers to one of the ranges used to subdivide the data.

As a simple concrete example, if you have data about average income at every integer age from 21-65, and you combined 5-year ranges to produce average income at ages 21-25, 26-30, 31-35, etc., that would be binning (verb) the data. 21-25 would be one bin (noun), 26-30 would be the second bin, etc.

$\endgroup$
3
$\begingroup$

In its simplest form binning of data from something like a CCD camera (as may be used for spectroscopy) means adding together mutiple pixels.

On a 2d array detector (like an everyday camera) you could for example use 2x2 binning to add together 4 pixels into one output value. Each pixel contributes exactly once to the output data, i.e. appears in exactly one bin. This halves your resolution (in both axes) so why would you do it? To reduce the noise and increase the signal. Without going into too much detail adding together random noise from the various pixels causes an increase in noise proportional to the square root of the number of pixels, while the signal (if evenly distributed across the pixels) increases with the number of pixels.

Now in a spectroscopic system you may have 2 axes -- one spectral and possibly one spatial. Typically though all the spatial information is thrown away by binning all pixels in that dimension. This is shown in Figure 1 below (for a different type of spectroscopy but it's what I had). The vertical dimension here holds (extraneous) spatial data, while the light is dispersed in the horizontal ($x$) direction to form a spectrum

CCD image to spectrum

Figure 1: An image read from a CCD (top) is summed in $y$ to generate a spectrum. Binning in the $x$ dimension may be used to improve signal-to-noise at the expense of resolution. The spectrum here is real, while the CCD image is a sketch illustrating the appearance of the signal and the noise.

So now we can concentrate on spectral binning. Here we add together the values of adjacent pixels, reducing the spectral resolution and the noise. In the figure that means summing adjacent values in the $x$ direction of the raw CCD readout. It's not necessary to do this in the spectrum I've shown, good curve fits could be obtained from both the $520~\mathrm{cm^{-1}}$ and $568~\mathrm{cm^{-1}}$ peaks (note the $\times 10^4$ in the $y$ axis).

Finally to relate this to peak statistics. What you tend to be interested in is the peak position and height or integrated area. These are typically taken by fitting a peak function of a suitable form to the peak, commonly a Gaussian. If you're going to fit a Gaussian to the peak, you need to measure the difference between a candidate fit and the real data to assess the quality of the candiate fit. In the document you linked that difference is measured using $\chi^2$. The fitting routine can then adjust the parameters of the peak function and try to minimise the $\chi^2$. As the linked document explains, when fitting a Gaussian using $\chi^2$ minimisation, you need to be dealing with Gaussian-distributed data. If you're dealing with single photons, the intensity distribution won't be Gaussian and the quality of the fit will suffer. Accorsing to the document you linked, by the time bins contain around 20 counts (photons) you can treat the photon count distribution as Gaussian. Reporting the minimum $\chi^2$ value found gives an indication of the quality of the fit.

$\endgroup$

Not the answer you're looking for? Browse other questions tagged or ask your own question.