3
$\begingroup$

I have about 1,000 SNP-Chip data (samples) that I'd like to impute over (for the purpose of having more rsIDs to match against GWAS data).

However, I don't know the ancestry of each sample / the ancestry hasn't been recorded in a reliable way.

Is there a 'quick and dirty' method to decide which imputation panel to use based on the genotype data itself? e.g.

  • sample 1 'looks' British in England and Scotland (GBR),
  • sample 2 'looks' Colombian in Medellin, Colombia (CLM),
  • sample 3 'looks' Bengali in Bangladesh (BEB), etc.

Once I know which group each sample matches most closely I can then perform imputation with the appropriate imputation panel.

$\endgroup$

2 Answers 2

1
$\begingroup$

If you're using an hmm-based method such as for example impute2 or later, then the imputation method will perform the haplotype matching for you. No need to cut down the reference panel. Try to get a big reference panel such as 1000 genomes phase 3 or the haplotype reference consortium (disclaimer: I am an author on both those papers).

But do be careful about relative power: a sample with haplotypes that are better represented in the ref panel than another sample will have better-imputed genotypes than the other sample.

$\endgroup$
5
  • $\begingroup$ Nice, I didn't know that impute2 did that. Very cool. I take your point that some samples will have better (more extensive and reliable) imputation than others. $\endgroup$
    – Dan Bolser
    Commented Feb 15, 2022 at 16:06
  • 1
    $\begingroup$ If you do want to use the haplotype reference consortium (HRC) panel, then you may need to upload your SNP-chip data to an imputation server, as the HRC panel is partially restricted due to the wishes of the participants. $\endgroup$
    – winni2k
    Commented Feb 16, 2022 at 7:59
  • $\begingroup$ Also, there has been some recent work on improving imputation speed using very large reference panels: journals.plos.org/plosgenetics/article?id=10.1371/… $\endgroup$
    – winni2k
    Commented Feb 16, 2022 at 7:59
  • $\begingroup$ can you recommend an imputation server to give me access to HRC and uses HMM-based method such as for example impute2? You seem to be implying that HRC is the largest reference panel out there, but perhaps that's not true... (sorry for my basic questions). $\endgroup$
    – Dan Bolser
    Commented Feb 16, 2022 at 16:14
  • 1
    $\begingroup$ I cannot vouch for these servers, but they are run by reputable groups (my coauthors on the HRC paper in fact): sanger.ac.uk/tool/sanger-imputation-service and imputationserver.sph.umich.edu/index.html#! $\endgroup$
    – winni2k
    Commented Feb 16, 2022 at 19:47
1
$\begingroup$

If I were you, I would use ADMIXTURE in supervised mode --supervised and use some of the 1000 genomes populations as reference populations. This is a fast and accurate way to obtain broad-scale ancestry proportion estimates for each of your individuals.

$\endgroup$
1
  • $\begingroup$ Thanks, that would answer my question, but it's a bit of an XY question... I want to get ancestry to do imputation... It looks like there are better ways to do imputation than first selecting ancestry :-) $\endgroup$
    – Dan Bolser
    Commented Feb 15, 2022 at 16:07

Not the answer you're looking for? Browse other questions tagged or ask your own question.