The preferred way of binning (classifying) data for galaxy snapshots in Python

Question

Let's say I have a galaxy snapshot that I load with some package (pynbody etc) and I store my spatial coordinates of stars in the galaxy in x, y, z arrays and also velocity in array vz. Now I want to lay a grid of (N × N cells along x and y axis) onto my galaxy in the x-y plane (assuming my line of sight is z-axis) so that the values of vz of stars corresponding to the associated pairs of x,y of the stars will fall into each cell of this grid. Then I want to extract the set of vz values from every cell of the grid and do whatever I want with them, such as, fit a Gaussian or modified Gaussian functions that I define myself etc. I think this is a fairly common practice in computational astronomy.

I was wondering what are the popularly used Python functions/codes for this among computational astrophysicists?

For example, I think scipy.stats.binned_statistic_2d() is useful for getting some statistics like mean and variance (dispersion) but can we get access to the array of velocity values in every cell so that we can use self-defined functions on them and not just use the options available automatically in scipy.stats.binned_statistic_2d?

Any examples/references to supplement the answer are welcome.

Prallax · Accepted Answer · 2022-07-15 18:54:50Z

I think that what you are looking for is just the fourth returned argument of scipy.stats.binned_statistic_2d, called binnumber. It is an array that gives the 2d bin number for each of your stars, so that you know which stars fall in each bin and you can compute all the statistics that you need.

https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.binned_statistic_2d.html

Edit: as required in the comments, I will provide an example.

x=np.array([-10,-2,4,12,3,6,8,14,3])
y=np.array([5,5,-6,8,-20,10,2,2,8])
v=np.array([4,-6,-10,40,22,-14,20,8,-10])

x_bins = np.linspace(-20, 20, 3)
y_bins = np.linspace(-20, 20, 3)

pstat = stats.binned_statistic_2d(x,y,v,statistic='mean, bins=[x_bins,y_bins], expand_binnumbers=True)

v_binnumber = pstat.binnumber

Now v_binnumber[0][i] contains the x_bin coordinate of v[i] and v_binnumber[1][i] contains the y_bin coordinate. So, for each element of v, you know where it falls on the xy grid. And you can do statistics on the elements that fall in the same bin. For example, if I want to calculate the average of the velocities in the bin [5,6], I will do

avg=np.mean(v[(v_binnumber[0] == 5) & (v_binnumber[1] == 6)])

it was not clear from the example in the documentation. Could you please give me an example? e.g.if I code it like this then how to get v in each bin from pstat:` x=np.array([-10,-2,4,12,3,6,8,14,3]) y=np.array([5,5,-6,8,-20,10,2,2,8]) v=np.array([4,-6,-10,40,22,-14,20,8,-10]) x_bins = np.linspace(-20, 20, 3) y_bins = np.linspace(-20, 20, 3), pstat = stats.binned_statistic_2d(x,y,v,statistic='mean', bins=[x_bins,y_bins])` — Jerome, Commented Jul 15, 2022 at 16:13

Stack Exchange Network

The preferred way of binning (classifying) data for galaxy snapshots in Python

1 Answer 1

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
astrophysics
data-analysis
python
velocity
dispersion
.

Hot Network Questions

The preferred way of binning (classifying) data for galaxy snapshots in Python

1 Answer 1

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged astrophysicsdata-analysispythonvelocitydispersion.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
astrophysics
data-analysis
python
velocity
dispersion
.