Background I am just learning how to work with Spatial Distribution Models, these are basically regressions on environmental values and occurrence coordinates for a species. The result is a grid map that shows the probability of finding a species in each cell. I made two projections for 2050 under two different climate scenarios (C1, C2) for a species X in area A. Looking at the maps, I realized they weren't too different, although the climate scenarios are different. So I wanted to check if they are statistically significantly different.
Deciding what test to use For the moment, I would like to keep everything as simple as can be (because my statistical ability is still very basic) although I do realize I might be simplifying too much. Anyways, my research question is:
Is there an effect of two different climate scenarios (C1, C2) on the habitat suitability probabilities for species X in area A?
I had read that t-tests are robust to non-normal distributions since they test variance distribution not data distribution. Since I have a large data set with similar distribution I thought all I needed was an independent T-test. I assumed independent because I don't want to compare a before or after, or relate the two scenarios in anyway, just compare the two different simulations. But the more I thought about it, the more it seemed to me these had to be paired data. I know paired data t-test is more sensitive, so I decided to read up and I ended up completely confused and feeling like I don't really understand anything. My data is not normally distributed, kurtosis>10 and skew>2.8.
C1, n=2299; SD=0.104
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00000 0.00000 0.00000 0.05723 0.04762 0.61905
C2 n=2299; SD=0.097
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00000 0.00000 0.00000 0.05483 0.04762 0.66667
The results for my tests: Paired:
Wilcoxon signed rank test with continuity correction
V = 122780, p-value = 0.01777
Paired t-test
t = -2.9716, df = 2298, p-value = 0.002993
95 percent confidence interval:
-0.003988266 -0.000817136
mean of the differences
-0.002402701
Not paired:
Wilcoxon rank sum test with continuity correction
W = 2625875, p-value = 0.6833
Welch Two Sample t-test
t = -0.80826, df = 4574.4, p-value = 0.419
95 percent confidence interval:
-0.008230619 0.003425217
A book on conservation recommends the non-parametric test. This article from a thread makes a case for the T-test. This reply to highly skewed data, I understand says t-test has no power to detect differences in these cases. I read another thread recommending permutation or bootstrap when zeroes make up most of your data. I read another article pointing to just using descriptive statistics, specially if using variables like probabilities.
A few papers I checked, dive much deeper and usually aim at getting the best model, or regressions with more variables.
My question is: given over 50% of may data are zeroes, and I think I am comparing the same subjects conceptually (same map grid) under 2 different climate simulations (not related to each other except through initial conditions) can I just go ahead and use a paired T-test, or is there a better way to do this?
Thanks!!!