0
$\begingroup$

Let's assume I have 2 time series of daily observations of a given experiment. The data of one time series show a very long tail (either side) and in absolute sense the difference between the lowest and second lowest observation is much larger than the distance between any other point (i.e. second lowest Vs third lowest, etc). So for instance I might have something like [-100, -50, -40, -30, -20, ...] which represents the time series sorted based on size of the observation. On the other hand, the second time series might look like [-60, -50, -40, -30, -20, ...]. Now suppose I need to work with the ranks instead of the values themselves, for example because I'm using copulas. In terms of ranks, these 2 times series are aligned so when I simulate from a copula it's unlikely that the simulated 1st and 2nd rank will show the same long tail as the first time series once I go through the inverse CDF. Unless I manage to find an appropriate marginal distribution that would accommodate this phenomenon.

The question is what is the best practice to handle situations like these where there are only very few (or sometimes just one) large observations and then the rest of the distribution looks more well behaved.

I tried using Pareto tails fit to the top 1% or 5% of observations but then I'm not able to replicate the behaviour seen in the original data. In this case, I might end up having both the 1st and 2nd simulated values lying deep in the tail while in reality that's not an accurate modelling of the reality. Any suggestions is well appreciated.

New contributor
greta salmon is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct.
$\endgroup$

0