In its simplest form binning of data from something like a CCD camera (as may be used for spectroscopy) means adding together mutiple pixels.
On a 2d array detector (like an everyday camera) you could for example use 2x2 binning to add together 4 pixels into one output value. Each pixel contributes exactly once to the output data, i.e. appears in exactly one bin. This halves your resolution (in both axes) so why would you do it? To reduce the noise and increase the signal. Without going into too much detail adding together random noise from the various pixels causes an increase in noise proportional to the square root of the number of pixels, while the signal (if evenly distributed across the pixels) increases with the number of pixels.
Now in a spectroscopic system you may have 2 axes -- one spectral and possibly one spatial. Typically though all the spatial information is thrown away by binning all pixels in that dimension. This is shown in Figure 1 below (for a different type of spectroscopy but it's what I had). The vertical dimension here holds (extraneous) spatial data, while the light is dispersed in the horizontal ($x$) direction to form a spectrum
![CCD image to spectrum](https://cdn.statically.io/img/i.sstatic.net/RL8Tz.png)
Figure 1: An image read from a CCD (top) is summed in $y$ to generate a spectrum. Binning in the $x$ dimension may be used to improve signal-to-noise at the expense of resolution. The spectrum here is real, while the CCD image is a sketch illustrating the appearance of the signal and the noise.
So now we can concentrate on spectral binning. Here we add together the values of adjacent pixels, reducing the spectral resolution and the noise. In the figure that means summing adjacent values in the $x$ direction of the raw CCD readout. It's not necessary to do this in the spectrum I've shown, good curve fits could be obtained from both the $520~\mathrm{cm^{-1}}$ and $568~\mathrm{cm^{-1}}$ peaks (note the $\times 10^4$ in the $y$ axis).
Finally to relate this to peak statistics. What you tend to be interested in is the peak position and height or integrated area. These are typically taken by fitting a peak function of a suitable form to the peak, commonly a Gaussian. If you're going to fit a Gaussian to the peak, you need to measure the difference between a candidate fit and the real data to assess the quality of the candiate fit. In the document you linked that difference is measured using $\chi^2$. The fitting routine can then adjust the parameters of the peak function and try to minimise the $\chi^2$. As the linked document explains, when fitting a Gaussian using $\chi^2$ minimisation, you need to be dealing with Gaussian-distributed data. If you're dealing with single photons, the intensity distribution won't be Gaussian and the quality of the fit will suffer. Accorsing to the document you linked, by the time bins contain around 20 counts (photons) you can treat the photon count distribution as Gaussian. Reporting the minimum $\chi^2$ value found gives an indication of the quality of the fit.