I have a data set with N ~ 5000 and about 1/2 missing on at least one important variable. Comparing the people with missing data to those with complete data it is fairly clear that the data are not missing at random (nonignorable nonresponse). The missing data pattern is not monotone missing and at least some data is missing on about a dozen variables.
The main analytic method will be Cox proportional hazards.
I am using SAS
which now offers an MNAR
method for such data. If the pattern is monotone, then it offers options to estimate the missing data with either nearest neighbors or complete cases. If the pattern is non-monotone, then it offers only a method of 'adjusting' the parameters, which seems (from what I understand) like a pure sensitivity analysis.
SAS
also offers a MCMC method of creating a monotone missing pattern from a non-monotone pattern.
My current plan is to first create a monotone missing pattern and then apply nearest neighbor, then analyze the multiply imputed data.
However, I am not sure this is best, nor what determines which options to choose in a scenario like this. Advice welcome as are references to the literature.