Counterintuitive result when comparing two groups of time series

Question

I have two groups of time series and I am testing the hypothesis that the groups can be distinguished in some way. Each time series is measurements of an individual’s pupil size as they listen to an emotive story. At the end they answer one question concerning the story which they can get right or wrong. So I just want to see if there is something about changes in pupil size over time which distinguishes those who answer correctly ($n=75$) from those who don’t ($n=27$).

I thought that a straightforward, very general and sensitive way would be this: For each individual, work out the mean zero-lag cross-correlation with each individual in the correct group, and the mean zero-lag cross-correlation with each individual in the incorrect group. I think my hypothesis then predicts that individual’s will have a higher mean correlation with members of their own group than with members of the other group. So I did that and this I what I got:

                   Mean correlation         Mean correlation
                 with correct group     with incorrect group     p(comparison)
Correct group                  0.12                     0.18            <0.001
Incorrect group                0.18                     0.20             0.104

So I have the very strange result that individuals in the correct answering group are more similar to the other group that they are similar to themselves. At first I thought this impossible and I checked my scripts very carefully. But I now believe this really is the result I got. I think I also have a way to interpret it: What specifically distinguishes the correct group is that their time series change more often and more unpredictably than the incorrect group. My questions are:

Do you think my interpretation could be right?
Is there something flawed in my method so that this is simply an methodological artefact?
Is there a better way to do this?

EDIT. In response to the very useful answer and comment, I add a little more information. The stories are recorded and the data is collected with a 60Hz eye-tracker so there are no problems there. The data looks like this (a little challenging to explore visually because there is a lot):

data

I plan to use methods like the suggested surrogate time series, and further summary statistics, to better understand the nature of the difference in my data. However I would appreciate comment on one further issue. I understand that the difference between Correct and Incorrect that I think I have demonstrated may not be meaningful from the perspective of certain hypotheses. I wonder however if you would agree with the following statement? The significantly p-value from the test I performed indicates at least that we should reject the null-hypothesis that the two sets of time-series come from exactly the same population (i.e. some aspect of the pupil-data is related to the subsequent answer).

The information in the table points to substantial variability in the correlation coefficients. That suggests it may be misleading, or even meaningless, to compare averages. But the problem is likely more fundamental than that: why should these cross-correlations tell you much about the subjects' responses? Slight differences in response times between two otherwise identically-responding subjects could create strong negative correlations among them. It appears you may need to explore your data (or at least a small subset) in order to develop a relevant and useful similarity metric. — whuber, Commented Oct 10, 2014 at 20:51
@whuber: comment appreciated, more information added in edit. — Amorphia, Commented Oct 13, 2014 at 15:18
At this point I see few differences, either qualitative or quantitative, between the two groups of data--but that isn't saying much because these plots reveal little. One way to start making sense of a mess like this is described at stats.stackexchange.com/a/46350/919. That kind of exploration can be followed up by robust regression to remove common trends and identify isolated outliers. Only at that point could one even hope that a more sensitive (but far less method) like cross-correlation would help reveal anything. — whuber, Commented Oct 13, 2014 at 15:58
I am no expert on eye dilation, but are those strong jumps really to be expected? They look like measurement artefacts to me, whose effects may very well exceed the impact of anything you are interested in. — Wrzlprmft, Commented Oct 13, 2014 at 19:48
The strong jumps are probably blinks. Really such data should be preprocessed. As this is an initial analysis I have not done this yet because for now I just want to test the hypothesis that there is any difference at all between groups, and such artefacts should be equal between groups. I take all your points but I am still struggling to understand how my improvised test could produce a clear group differences if there is none - even if it is e.g. just a difference in blink rate... — Amorphia, Commented Oct 14, 2014 at 7:02

Wrzlprmft · Accepted Answer · 2014-10-13 19:37:18Z

Regarding 1. and 2.

It can very well be that other features of your time series (than you want to measure) lead to higher cross-correlations. To give an illustrative example (that does not exactly apply to your situation), the absolute values of the cross-correlation between time series with the same frequency content will on average be higher. In your exapmle, it can very well be that your cross-correlation is mainly determined by the strong outliers and thus a few outliers decide about the value and you thus overestimate the significance of your result.

A good way to find out (and eventually correct this) is using surrogate time series, i.e., artefical control time series, which keep some features of the original data but destroy others. For example, if you generate IAAFT surrogates of your individual time series, you would keep their amplitude distribution as well as their autocorrelation function (and thus their frequency content). In particular you would keep the strong fluctuations for your “correct group”.

On the other hand, you would destroy any information tied to absolute time, in particular your story. Relatedly, you also destroy any actual temporal correlations to other time series. Therfore any cross-correlation you measure between those surrogate time series only has to come from the individual features of those time series (amplitude and frequency content) but not from an actual correlated temporal evolution (because you destroyed that information in the surrogates).

So, if you observe comparable results when analysing surrogate time series instead of your actual time series, your observations are due to a methodological artifact.

Regarding 2. and 3.

The cross-correlation effectively measures whether two time series change synchronously. Using it thus relies on the assumption that they were taken under synchronous conditions with respect to your sampling frequency and the variability of your time series. For example, if the lowest relevant frequency content in your time series corresponds to a period of 10 s, you cannot allow for a comparable inaccuracy of the delay between the beginning of the recording and your story.

Moreover, this assumes, that everybody reacts to the story in a very similar way. For example, you would assume that the pupils of all subjects (or at least all subject in one group) dilate during a specific part of the story. I do not think that humans behave that predictably. From another point of view: You probably know a lot of things unrelated to your experiment that affect the pupil size on the relevant time scales. All these effects are already meaking univariate time-series analysis very difficult. Bivariate measures (such as the cross-correlation) may be rendered fully meaningless. In general, I would be very careful with applying bivariate measures to time series that were not recorded simultaneously.

Looking at your time series confirms this: The time series are highly individual, in particular the outliers and jumps. The result of the cross-correlation is therefore very likely to be meaningless.

Instead I recommend take a look at simple univariate time-series measures, such as the mean, the standard deviation, the frequency content, etc. and statistically compare them (be aware of multiple testing and cherry-picking though). This could in particular capture the observation you already made that “time series change more often and more unpredictably” in one group.

comment appreciated, more information added in edit, one more response would be appreciated. — Amorphia, Commented Oct 13, 2014 at 15:19
@user25676: My assessment stays more or less the same: I strongly recommend to take a look at simple univariate measures first. — Wrzlprmft, Commented Oct 13, 2014 at 19:49

Stack Exchange Network

Counterintuitive result when comparing two groups of time series

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
time-series
group-differences
cross-correlation
or ask your own question.

Linked

Hot Network Questions

Counterintuitive result when comparing two groups of time series

1 Answer 1

Not the answer you're looking for? Browse other questions tagged time-seriesgroup-differencescross-correlation or ask your own question.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
time-series
group-differences
cross-correlation
or ask your own question.