12
$\begingroup$

I have a time series of data that is 300 days long. I compute PCA factor loadings on a moving window of 30 days. There are 7 stocks in the universe. Thus factors F1 through F7 are calculated on each PCA calculation.

However, the signs on factor loadings change. This causes problems when interpreting factor price time series.

What are the different approaches to deal with this problem?

$\endgroup$
4
  • $\begingroup$ For all components? When you say loadings, you mean the eigen vectors? Could you post some code? $\endgroup$
    – SpeedBoots
    Commented Mar 19, 2012 at 11:48
  • $\begingroup$ There's a related question that might help: link $\endgroup$
    – michaelv2
    Commented Mar 19, 2012 at 15:04
  • $\begingroup$ @michaelv2 I'm not sure how what that link has to do with the question here. $\endgroup$ Commented Mar 19, 2012 at 15:10
  • $\begingroup$ My intent was to link to the original question, where some comments address the fact that spectral decompositions can produce eigenvectors with arbitrary signs (and some of the ways of dealing with that). $\endgroup$
    – michaelv2
    Commented Mar 20, 2012 at 13:13

5 Answers 5

11
$\begingroup$

1) Eigenvector times minus one is also an eigenvector (with the same eigenvalue). 2) Distinct eigenvectors of a symmetrical matrix (i.e. covariance) are orthogonal. 1 and 2 imply that you can multiply a subset of all the eigenvectors of a symmetrical matrix by minus one an you still get a full set of eigenvectors

Which means, just impose that the first component of every factor is positive. If the PCA returns the first component as negative multiply all the vector by minus one. That will solve your problem.

$\endgroup$
1
  • $\begingroup$ Even there Mepuzza, youc ould still have a problem. The reason is that you have to be sure that the first element is greater than zero in absolute terms, otherwise slight changes in the historical matrix could perturbe this element and make it change signs. $\endgroup$
    – Anass
    Commented May 18, 2012 at 9:12
8
$\begingroup$

You can compute the PCA on overlapping windows, and try to match the eigenvectors: you may need to change not only their sign (since only the eigenspaces are well-defined, the sign of the eigenvectors is arbitrary) but also their order.

Here is some (untested) R code to do this.

# Sample data
k <- 7
n <- 50
found <- FALSE
while(!found) {
  x <- matrix(rnorm(k*(n*1)),nc=k)
  e1 <- eigen(var(x[-1,]))
  e2 <- eigen(var(x[1:n,]))
  found <- e1$vectors[1,1] * e2$vectors[1,1] < 0
}
colnames(e1$vectors) <- LETTERS[1:k]
colnames(e2$vectors) <- letters[1:k]

# Compare the eigenvectors, 
# by computing the cosine of the angle they form.
d <- cor(e1$vectors, e2$vectors)

# Permutation of the vectors
i <- apply(abs(d), 1, which.max)
e2$values  <- e2$values[i]
e2$vectors <- e2$vectors[,i]

# Change the sign, if needed
j <- sign(diag(d[1:k,i]))
e2$vectors <- t( t(e2$vectors) * j )
$\endgroup$
3
$\begingroup$

If you're referring to this problem then there is a very complete answer on the cross validated stack exchange.

$\endgroup$
3
$\begingroup$

I am also interested in resolving this problem, although, decided not to create separate thread for it yet. This is kind of continuation of previous question below.

https://stats.stackexchange.com/questions/34396/im-getting-jumpy-loadings-in-rollapply-pca-in-r-can-i-fix-it

In factor analysis, specifically PCA, sign of the loadings does not mean anything, but if someone, like me, wants to project time series to selected principal component then we will see the picture like on the link above - jumpy loadings.

All answers that I have seen speculated with multiplying vector by minus 1 if you think that particular vector with loadings jumps too much. There are even answers that suggest how exactly to identify that "too much" by comparing two vectors - actual and previous. But it is still not enough because even according to the answers I will be able to identify sign changes only in the middle of the sequence, when I have calculated 2 vectors and am able to compare them. It is not enough because I still need to be sure that first sign was defined properly and here is the problem - you assume that first eigenvector is correct, compute next one and realize that sign needs to be changed, the same for the rest of the sequence and when you plot this to the chart you realize that loadings are not jumpy anymore but the whole chart is inverted because when you calculated first eigenvector it was wrong and nobody yet suggested how to check exactly FIRST eigenvector without comparing it with the others.

At the moment, I have three suggestions of how these jumps in loadings can be removed.

  • summarize all values in eigenvector to predict what eigenspace it lays in
eigenvector [-0.5, 0.2, 0.4]
-0.5 + 0.2 + 0.4 = 0.1 > 0 = sign is correct and should not change

eigenvector [-0.5, 0.2, 0.1]
-0.5 + 0.2 + 0.1 = -0.2 < 0 = change sign
  • compute dot product between actual and previous vectors
vectorActual [-0.5, 0.2]
vectorPrev [0.3, -0.1]
-0.5 * 0.3 + 0.2 * -0.1 = -0.17 < 0 = change sign
  • compute difference between actual and previous vectors
vectorActual [-0.5, 0.2]
vectorPrev [0.3, -0.1]

Diff = MathAbs(-0.5 - 0.3) + MathAbs(0.2 - -0.1) = 1.1
Sum  = MathAbs(-0.5 + 0.3) + MathAbs(0.2 + -0.1) = 0.3
Diff > Sum = change sign

I am attaching image with a chart to show what I mean. As far as you can see, simple summary shows good results without inversion but in some case jumps are possible. At the same time, Dot Product and comparison between Diff - Sum show chart without jumps at all but it is inverted, which is wrong!

High resolution image - http://snag.gy/zCqHx.jpg

enter image description here

So, as far as this may be an answer to the question in this topic I am still interested in how to check that orientation of the FIRST vector in a sequence is correct?

$\endgroup$
0
$\begingroup$

Personally, I don't find any of the answers provided to be of that much help in answering the question. Factor analysis was developed for information collected for a single point in time. It's only been in the last few decades that extensions were made to, first, two or a few time periods, and then most recently truly longitudinal models have been proposed. Everyone writing about these challenges points to the same issues that the OP has noted: scores and loadings can and will change over time, not only that, the qualitative meaning of the factors can and will change. For instance, if your time series data isn't consistently massaged and transformed, how can you distinguish between real underlying change from simple noise? These challenges have to be dealt with in a systematic and rigorous fashion which, e.g., "overlapping windows" or "rollover" R modules simply do not address. This is because no matter how big your data sample is, it's still a finite data sample subject to all the vagaries and vicissitudes of less than infinite amounts of information. In other words, it's only in the infinite theoretical asymptotic limit that many theorems, proofs and results hold true. This is just as much true for FA as it is for portfolio theory.

From my point of view, you need to bite the bullet and work through the academic literature. The quantitative finance literature's kluge answers are as already noted in the answers posted to this thread. Forget about that and read the psychometric academic literature. There you will find thoughtful suggestions and answers to these challenges. You will also find that the "state-of-the-art" may not have all the answers that you're looking for. In other words, textbook answers may not yet have been developed. If you find new and creative solutions, you might get a paper pub'd.

Here's an example of a recent book which has articles on this topic: Factor Analysis at 100: Historical Developments and Future Directions by Cudek and McCallum (eds., 2007). In particular, pay attention to the two papers by MacArdle, Brown and Zhong. The structural equations literature also has useful contributions. Just google "longitudinal factor analysis" and a plethora of articles will pop up.

$\endgroup$

Not the answer you're looking for? Browse other questions tagged or ask your own question.