0
$\begingroup$

Background I am just learning how to work with Spatial Distribution Models, these are basically regressions on environmental values and occurrence coordinates for a species. The result is a grid map that shows the probability of finding a species in each cell. I made two projections for 2050 under two different climate scenarios (C1, C2) for a species X in area A. Looking at the maps, I realized they weren't too different, although the climate scenarios are different. So I wanted to check if they are statistically significantly different.

Deciding what test to use For the moment, I would like to keep everything as simple as can be (because my statistical ability is still very basic) although I do realize I might be simplifying too much. Anyways, my research question is:

Is there an effect of two different climate scenarios (C1, C2) on the habitat suitability probabilities for species X in area A?

I had read that t-tests are robust to non-normal distributions since they test variance distribution not data distribution. Since I have a large data set with similar distribution I thought all I needed was an independent T-test. I assumed independent because I don't want to compare a before or after, or relate the two scenarios in anyway, just compare the two different simulations. But the more I thought about it, the more it seemed to me these had to be paired data. I know paired data t-test is more sensitive, so I decided to read up and I ended up completely confused and feeling like I don't really understand anything. My data is not normally distributed, kurtosis>10 and skew>2.8.

C1, n=2299; SD=0.104
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
0.00000 0.00000 0.00000 0.05723 0.04762 0.61905 
C2 n=2299; SD=0.097
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
0.00000 0.00000 0.00000 0.05483 0.04762 0.66667 

The results for my tests: Paired:

Wilcoxon signed rank test with continuity correction
V = 122780, p-value = 0.01777

Paired t-test
t = -2.9716, df = 2298, p-value = 0.002993
95 percent confidence interval:
 -0.003988266 -0.000817136
mean of the differences 
           -0.002402701 

Not paired:

Wilcoxon rank sum test with continuity correction
W = 2625875, p-value = 0.6833

    Welch Two Sample t-test
t = -0.80826, df = 4574.4, p-value = 0.419
95 percent confidence interval:
 -0.008230619  0.003425217 

A book on conservation recommends the non-parametric test. This article from a thread makes a case for the T-test. This reply to highly skewed data, I understand says t-test has no power to detect differences in these cases. I read another thread recommending permutation or bootstrap when zeroes make up most of your data. I read another article pointing to just using descriptive statistics, specially if using variables like probabilities.

A few papers I checked, dive much deeper and usually aim at getting the best model, or regressions with more variables.

My question is: given over 50% of may data are zeroes, and I think I am comparing the same subjects conceptually (same map grid) under 2 different climate simulations (not related to each other except through initial conditions) can I just go ahead and use a paired T-test, or is there a better way to do this?

Thanks!!!

$\endgroup$
3
  • $\begingroup$ Perhaps a bigger problem is the spatial correlation. That problems affects all the tests. I would lean towards your sources that recommend more explicit modeling, not only for the distributions, but also for the dependence structure. You can answer the question as to whether the observed differences are explainable by chance alone within such a model-based approach. $\endgroup$ Commented Nov 1, 2021 at 1:47
  • $\begingroup$ @BigBendRegion, thanks! I agree that a t-test (or similar) isn't ideal and won't encompass my model's true differences, because for example, I could end up having the same probabilities but in different locations (which is important) and a t-test would never tell me this. However to start out, I didn't want to complicate things, because for the moment I wanted to focus on understanding the variables. So I thought the simplest insight into a difference could just be a t-test or so. Once I understand models better, I think I would probably understand how to compare them better. $\endgroup$
    – Z U
    Commented Nov 1, 2021 at 19:49
  • $\begingroup$ Even so, the test is not val8d because the data values are not independent. The common nonparametric methods are not valid either, for the same reason. $\endgroup$ Commented Nov 1, 2021 at 20:44

0