I have a dataset with N ≈12800 with two types of variables: independent continuous (distances in m) and a dependent binary variable (yes and no), associated with each distance. I have to test the correlation between the two variables. The test that seemed appropriate for that was Point Biserial correlation. However, the dataset violates the underlying assumptions for this test, especially since the dataset is not normally distributed, it has outliers that cannot be removed and also the variances of the continuous variable for each binary group is not equal.
I read conflicting arguments regarding assuming the normality of the dataset considering the very high sample size. However, the metrics for the tests for normality (Jacque Bera, Kolmogorov-Smirnov) indicate the data is not normally distributed.
Considering this, which would be the most appropriate test for testing the correlation between the two variables? On researching, it seems that Spearman ρ is the most appropriate, however, I am not sure if this is valid in the case that one of the variables is binary.