I have two groups of time series and I am testing the hypothesis that the groups can be distinguished in some way. Each time series is measurements of an individual’s pupil size as they listen to an emotive story. At the end they answer one question concerning the story which they can get right or wrong. So I just want to see if there is something about changes in pupil size over time which distinguishes those who answer correctly ($n=75$) from those who don’t ($n=27$).
I thought that a straightforward, very general and sensitive way would be this: For each individual, work out the mean zero-lag cross-correlation with each individual in the correct group, and the mean zero-lag cross-correlation with each individual in the incorrect group. I think my hypothesis then predicts that individual’s will have a higher mean correlation with members of their own group than with members of the other group. So I did that and this I what I got:
Mean correlation Mean correlation
with correct group with incorrect group p(comparison)
Correct group 0.12 0.18 <0.001
Incorrect group 0.18 0.20 0.104
So I have the very strange result that individuals in the correct answering group are more similar to the other group that they are similar to themselves. At first I thought this impossible and I checked my scripts very carefully. But I now believe this really is the result I got. I think I also have a way to interpret it: What specifically distinguishes the correct group is that their time series change more often and more unpredictably than the incorrect group. My questions are:
- Do you think my interpretation could be right?
- Is there something flawed in my method so that this is simply an methodological artefact?
- Is there a better way to do this?
EDIT. In response to the very useful answer and comment, I add a little more information. The stories are recorded and the data is collected with a 60Hz eye-tracker so there are no problems there. The data looks like this (a little challenging to explore visually because there is a lot):
I plan to use methods like the suggested surrogate time series, and further summary statistics, to better understand the nature of the difference in my data. However I would appreciate comment on one further issue. I understand that the difference between Correct and Incorrect that I think I have demonstrated may not be meaningful from the perspective of certain hypotheses. I wonder however if you would agree with the following statement? The significantly p-value from the test I performed indicates at least that we should reject the null-hypothesis that the two sets of time-series come from exactly the same population (i.e. some aspect of the pupil-data is related to the subsequent answer).