4
$\begingroup$

I'm looking for some options for imputation for a high-dimensional dataset of DNA methylation (bisulfite sequencing) data. Dimensions on the order of 50-100 samples x ~500,000 CpG loci/features.

I've used K-nearest neighbors, but It seems that this method is not very accurate. As far as I can tell, it limits the minimum number of "genes/features" to impute at one time to something like 1500. which is kind of a lot.

I've also used missForest to impute smaller datasets with greater accuracy, but it seems computationally infeasible to do it on the full 50x500,000 dataset.

I need to impute values because I'm using the data for some statistical modeling which require complete cases.

Anyone know of better alternatives than K-nearest neighbor for large scale imputation?

$\endgroup$

1 Answer 1

2
$\begingroup$

I think this is still an active field of research. I have heard of Phenix, which might be appropriate.

$\endgroup$
5
  • $\begingroup$ I agree it is still an active area of research. I will check out Phenix, looks like it could be promising. Thanks. $\endgroup$
    – Reilstein
    Commented Sep 20, 2017 at 3:37
  • $\begingroup$ @Reilstein: can you say anything about your experience with Phenix? $\endgroup$
    – winni2k
    Commented Dec 21, 2017 at 10:17
  • $\begingroup$ I never ended up using Phenix because it appears to be tailored for samples/individuals with a degree of relatedness. My particular study doesn't have samples from related individuals. I opted for K-nearest neighbor imputation while keeping the total missing values allowed below 5% prior to imputation. $\endgroup$
    – Reilstein
    Commented Dec 21, 2017 at 19:10
  • $\begingroup$ The description says "arbitrary level of relatedness". From my discussions with the author, I would guess that the level of appropriate relatedness includes the level of relatedness of "unrelated" human individuals. $\endgroup$
    – winni2k
    Commented Jan 13, 2018 at 15:41
  • $\begingroup$ ah okay, well I didn't look into it in enough depth apparently. Thanks for following through with the author. If I decide KNN Imputation isn't accurate enough for my purposes I will revisit this. Thanks. $\endgroup$
    – Reilstein
    Commented Jan 15, 2018 at 4:20

Not the answer you're looking for? Browse other questions tagged or ask your own question.