1
$\begingroup$

We are dealing with relatively large family data (> 1,000 trios) with genotyping array data. We would like to see the allele effects on children separately of mothers, fathers, and children, as in a previous study (PMID: 34282336). Unfortunately, this is difficult for us to replicate, so we need another method. Is there any software that can achieve this or a similar traditional approach? Also, can that be applied to grandparents?

In my understanding of the above mentioned study is they:

  1. identified the parental genomes within the child genome;
  2. thereon conducted a GWAS for birth weight

Overall, the aim of the previous study was to estimate the alleles responsible for birth weight or in our situation a comparable phenotype. It is important to note the prior study did not use publicly available algorithms.

Thus, we wish to identify the maternal and paternal alleles within their progeny via public domain algorithms, and therefore do not necessarily need to exactly reproduce the previous study.

We already have imputed data, and are using GCTA for usual GWAS. To achieve our goal, we need to identify the transmitted and non-transmitted alleles in parents.

$\endgroup$
3
  • $\begingroup$ Hi @PenguinPartyH0, thanks for the extra info. Just to be clear I don't do eukaryotic genomics in general, but I've edited the question with your added information. It would be useful to explain what commercial algorithms the authors used, because it is likely someone here will know the freeware version. That is very likely the most important information required. You can include this info by editing the question. I personally don't think this is a problematic situation. I have never personally known any commercial algorithms that are more powerful than those in the public domain $\endgroup$
    – M__
    Commented Oct 20, 2022 at 16:52
  • $\begingroup$ Has your data been phased as well as imputed, or just imputed? $\endgroup$
    – gringer
    Commented Oct 21, 2022 at 18:28
  • $\begingroup$ We have just imputed data and genotyped data. $\endgroup$ Commented Oct 21, 2022 at 23:46

1 Answer 1

1
$\begingroup$

The first haplotyping step is typically done in two stages:

  1. Imputation of missing genotypes - can be done using the Michigan Imputation Server
  2. Phasing of called genotypes to create semi-contiguous haplotypes - can be done using SHAPEIT

I have many thoughts about the best approach for GWAS (mostly around the idea of population subsampling), which could potentially also apply to what you want to do. Unfortunately, it's been a long time since I did family population genetics, and I've forgotten the things we did to deal with complex population structure. I'm pretty sure we used PLINK for at least some of it, but I notice that PLINK has a disclaimer about close family structure, which suggests it may not work so well for trio data:

This method does not properly adjust for small-scale family structure. As a consequence, it is usually necessary to prune close relations with e.g. --king-cutoff before using --glm for genome-wide association analysis. (Note that biobank data usually comes with a relationship-pruned sample ID list; you can use --keep on that list, instead of performing your own expensive --king-cutoff run.) If this throws out more samples than you'd like, consider using mixed model association software such as SAIGE, BOLT-LMM, GCTA, or FaST-LMM instead; or regenie's whole genome regression.

$\endgroup$
0

Not the answer you're looking for? Browse other questions tagged or ask your own question.