I am trying to fit a year-long half-hourly dataset into Python statsmodels' SARIMAX. I plotted the correlogram and it indicates AR and MA terms well above 5 each, and periodic terms (daily) of around 2. But even going more than order 2 in the AR or MA terms of either seasonal or nonseasonal type gives an MLE convergence error.
C:\Users\<user>\AppData\Local\Continuum\miniconda3\lib\site-packages\statsmodels\base\
model.py:496: ConvergenceWarning: Maximum Likelihood optimization failed to converge.
Check mle_retvals
"Check mle_retvals", ConvergenceWarning)
I get that it's a nonlinear model and that it fails to converge, but I am at a loss as to how to proceed. . It takes more than an hour to fit the entire dataset, which is why I am forced to use only a month or two of data when I'm iterating. I suspect intuitively that this reduces the potential order of terms that can be accommodated by the SARIMAX fitting without convergence issues, though I don't have an in-depth understanding of the concepts here.mle_retvals
isn't a property of the SARIMAXResults
object
Other possibly important details are that many values are missing throughout the dataset, and I'm relying on the statespace form to impute these missing values.
How can I solve the convergence errors I'm getting with SARIMAX? What aspects should I check that could be causing the issues? Are these issues usually fixable? Thanks.
Update: I don't know what happened now, but I am getting an mle_retvals object
{'Hinv': array([[1, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0],
[0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 1]]),
'converged': False,
'fcalls': 44,
'fopt': 0.22864169058350794,
'gcalls': 33,
'gopt': array([ 0.33676031, -0.35390488, -0.01763243, -0.09141768, -0.12037386,
0.08537955]),
'warnflag': 2}
This was for order=(2,0,2) and seasonal_order=(1,0,0,48). And I also now notice that there was a warning before the ConvergenceWarning
Warning: Desired error not necessarily achieved due to precision loss.
So the Hessian is singular; doesn't this sound like a data problem? probably this means that the data is of lower order than the model? But that isn't reflected in the ACF and PACF of the interpolated residual series of the (1,0,2),(1,0,0,48) model, where there are plenty of peaks above significance level (see PACF below). I used BFGS algorithm for this one.
Interpolating the data to remove the missing values makes no difference!
Changing algorithm : Nelder-Mead seemed to required a lot of iterations, but finally converging for (2,0,2),(1,0,0,48), see PACF below
mle_retvals
even though it exists... $\endgroup$