2
$\begingroup$

Ok, I get that this is a long question but it's ok if you don't know just thought I'd give this a try anyway. I get that it's a long shot. Basically I'm trying to replicate a methodology used in the paper below. It basically involves using a table of stars with various parameters and applying cuts until you get a sub-sample of (bulge) stars. The problem I'm having is that the authors find a different number of stars to the number I get after applying the cuts. I get 21389 stars whilst they get 21052. I think I've used the exact same methodology as them but it's hard to tell just from the paper. I was wondering if anybody could see anything obviously wrong in my python code. I'm pretty much a complete newbie at python tbh. The paper is below and the relevant sections are from 2-Data and sample up to 2.2.

https://arxiv.org/pdf/2204.01753

import astropy

from astropy.table import Table

from astropy.io import fits

import numpy as np

import matplotlib.pyplot as plt

# Define the HDUList object
hdulist = fits.open('DR17_McMillan_astroNN.fits')

# Access the primary data table
data_table = hdulist[1].data

# Close the FITS file to release memory
hdulist.close()

# Filter for stars with ASPCAPFLAG = 0
filtered_data = data_table[data_table['ASPCAPFLAG'] == 0]

# Filter for stars between 3000K and 6000K
teff_filtered_data = filtered_data[(filtered_data['TEFF'] >= 3000) & (filtered_data['TEFF'] <= 6000)] 

# Define the conditions for filtering
logg_condition = teff_filtered_data['LOGG'] < 3.6
snr_condition = teff_filtered_data['SNR'] > 70

# Apply the conditions to filter the data
logg_snr_filtered_data = teff_filtered_data[logg_condition & snr_condition]


#The parent sample is thus defined.Now the bulge sample must be created.




# Calculate X, Y, Z, and Rgc
xsun = 8.e3
weighted_dist = logg_snr_filtered_data['weighted_dist']  # Assuming 'dist' is the weighted distance column
GLON = logg_snr_filtered_data['GLON']  # Assuming 'GLON' is the Galactic longitude column
GLAT = logg_snr_filtered_data['GLAT']  # Assuming 'GLAT' is the Galactic latitude column

axx = (xsun - weighted_dist * np.cos(GLON * np.pi / 180.) * np.cos(GLAT * np.pi / 180.)) / 1.e3

ayy = weighted_dist * np.sin(GLON * np.pi / 180.) * np.cos(GLAT * np.pi / 180.) / 1.e3

azz = weighted_dist * np.sin(GLAT * np.pi / 180.) / 1.e3

aRgc = np.sqrt(axx**2. + ayy**2. + azz**2.)

# Define the conditions for masking using Rgc
rgc_condition = aRgc < 4  # Adjust this threshold as needed

# Apply the condition to filter the data based on Rgc
masked_data = logg_snr_filtered_data[rgc_condition]

# Calculate fractional distance uncertainty
fractional_dist_uncertainty = masked_data['weighted_dist_error'] / masked_data['weighted_dist']

# Define condition for fractional distance uncertainty
fractional_dist_uncertainty_condition = fractional_dist_uncertainty < 0.2

# Apply the condition to filter the data
final_filtered_data = masked_data[fractional_dist_uncertainty_condition]

# Define columns related to compositions
composition_columns = ['MG_FE', 'C_FE', 'N_FE', 'O_FE', 'SI_FE']  # Columns for Mg, C, N, O, and Si compositions

# Define conditions for masking out stars with undetermined compositions
undetermined_composition_condition = (
    ~np.isnan(masked_data['MG_FE']) &
    ~np.isnan(masked_data['C_FE']) &
    ~np.isnan(masked_data['N_FE']) &
    ~np.isnan(masked_data['O_FE']) &
    ~np.isnan(masked_data['SI_FE'])
)

# Combine the conditions using logical AND
final_mask = undetermined_composition_condition

# Apply the final mask to the filtered data
final_filtered_data = masked_data[final_mask]

# Load the VAC catalogue data
vac_catalogue_data = fits.open('VAC_GC_DR17_synspec_rev1-v1_1.fits.fits')

# Access the data table from the HDUList object
vac_data_table = vac_catalogue_data[1].data

# Extract the column containing IDs from the VAC catalogue
vac_ids = vac_data_table['APOGEE_ID']

# Check which IDs from the filtered data are present in the VAC catalogue
mask_vac = np.isin(final_filtered_data['APOGEE_ID'], vac_ids)

# Remove stars from the filtered data that are present in the VAC catalogue
final_filtered_data_cleaned = final_filtered_data[~mask_vac]

# Count the number of stars in the final filtered data
num_stars = len(final_filtered_data_cleaned)
print("Number of stars:", num_stars)
$\endgroup$
3
  • $\begingroup$ Please check your formatting. $\endgroup$
    – James K
    Commented May 18 at 21:05
  • $\begingroup$ Have you contacted the authors themselves with your question? $\endgroup$ Commented May 21 at 21:21
  • $\begingroup$ no I havent I didn't want to but now that you mention it maybe I should give that a go tbh. $\endgroup$
    – Rjs2312
    Commented May 21 at 22:55

2 Answers 2

3
$\begingroup$

When you apply the fractional distance uncertainty condition, you create a new array called final_filtered_data. But then in the next lines, you go back to using your array called masked_data, so you have lost the effects of the distance uncertainty filtering. This could account for the difference you’re seeing.

$\endgroup$
1
$\begingroup$

(If you haven't found the flaw yet) I notice you're using the bit-and operator |&| in several places where the logical-and operator |and| surely is more appropriate. That can lead to unexpected results.

$\endgroup$
1

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .