Covariance matrix of Gaussian EM output

Question

I have a project where i wanted to use Expectation Maximization to fill in missing logreturns. With regards to that I have a question I haven't been able to solve. Logically EM should decreese variance of the data as the estimations will be the "Most likely" results. Covariance effects decreese this, however I believe this is still the case. In other words, can I reliably use a covariance matrix calculated on the full output of the algorithim?

I saw that doing this gave me the best estimates of the emperical covariance matrix every simulation, even when truncating up to 60% of my test data and re-estimating it. Still it just doesn't make intutive sense to me...

I also considered if I should use the conditional covariance matrix that the algorithm calculates, however that gives worse estimates.

TLDR: Is it problematic to use EM to estimate up to 60% of my data and then emperically calculate the covariance matrix of the output using the default formula?

This is the first time I heard of the term EM being used. Can you explain how it is used to generate the missing log returns? — KaiSqDist, Commented Apr 21 at 20:41
@KaiSqDist ah yes sorry, by EM i refer to expectation maximization, which is an iterative method to fill in missing data based on a distributions Maximum Likelihood function. Se fx. "Maximum Likelihood from Incomplete Data via the EM Algorithm" by Dempster. Its also in "Risk and Asset Allocation" by Meucci — GTT, Commented Apr 21 at 21:03

lehalle · Accepted Answer · 2024-04-22 12:41:10Z

If you replace missing returns (and indeed if you replace anything that can be used as an input of an investment strategy), it is strongly recommended to never use future information (it means: to replace a value at time $t$, do not use any data that have been available after $t$).

If you need a covariance matrix to replace returns of a stock $k$ at date $t$ (they are so many difference way), you are right that you thus should not use any data after $t$. Your covariance matrix should be estimated only using past data.

It will drive you to the traditional problem of estimating covariance matrices (so many possibilities... have a look on stack exchange only).

My advice would be to not really use the full covariance matrix you have in mind but

create a point in time factor model, the "simplest" being a sliding PCA
just estimate (from past data) the coefficients of a regression of the past returns of your stock $k$ with your factors as covariates (ie explanatory variables).

[EDIT] following a comment about my "never use future information" recommendation. Unfortunately, there is often not time reversibility on markets, simply because the arrival of information has an asymmetric effect on price formation (see for instance Marcaccioli, Riccardo, Jean-Philippe Bouchaud, and Michael Benzaquen "Exogenous and endogenous price jumps belong to different dynamical classes" Journal of Statistical Mechanics: Theory and Experiment 2022, no. 2 (2022): 023403). This is not really a matter a non stationarity (it is indeed worst than that): they are two effects than are layered, first time-revertible dynamics (when no exogenous information occur), and then one that cannot in general be reverted. So no need to take any risk: just use information in the past. I am not saying it is impossible to make the correct change of variables and projections so that one ends up in a stationary environment, just that it is subtle and so it is better to avoid approaching this kind of problem.

Thank you for your response! With regards to "never use future information" is that for backtesting reasons or you don't want to use information (correlations) that have not appeared at that point? If its the latter then intuitively the problem of a non-stationary covariance matrix should be symmetrical around t (and therfore both data before and after t should be used). Maybe I'm overlooking something though... Is there perhaps any litterature on this? — GTT, Commented Apr 22 at 11:17

Stack Exchange Network

Covariance matrix of Gaussian EM output

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
normal-distribution
covariance-estimation
missing-values
or ask your own question.

Hot Network Questions

Covariance matrix of Gaussian EM output

1 Answer 1

Not the answer you're looking for? Browse other questions tagged normal-distributioncovariance-estimationmissing-values or ask your own question.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
normal-distribution
covariance-estimation
missing-values
or ask your own question.