2
$\begingroup$

I'm trying to replicate an existing polygenic score (i.e. test the accuracy in a new sample), and want to know if matching the original study's microarray chip will improve the accuracy (that is, variance explained in a new test set) of the polygenic score.

The labs I have access to have different multiple microarray chips to choose from (e.g., GSA, GDA, UK Biobank Axiom Array, UK BiLEVE Axiom array, GeneChip 2.0 array, Infinium Core-24 Kit, etc.). Will using a chip that is different than the chip used in the study that produced the polygenic score impact the explained variance (accuracy) of the polygenic score? If so, by how much?

Example: a study on heart disease runs a GWAS on 900,000 individuals who are genotyped with the UK BiLEVE Axiom array. This study uses these summary statistics to produce weights for a polygenic score. Lab A, B, and C are conducting polygenic score validation/replication studies, and they each have a random sample of 10,000 individuals drawn from the general population, and they all use the same summary statistics from the original study. After imputing the genotype data of the 10,000 people, they calculate the polygenic score for each person, and then evaluate how much variance in the phenotype they are able to explain. The only difference is that Lab A genotypes their 10,000 people with the UK BiLEVE Axiom array, whereas lab B uses the Illumina GSA, and lab C uses the Illumina GDA.

Do we have any reason to suspect that there will be a substantial difference in variance explained (r^2) by the polygenic score between labs? Are there any studies which have analyzed something that could be informative to this question?

I've been unable to find research on this question. This should not be dependent on which polygenic score is being studied; there is no reason to specify a particular phenotype.

Note: it is standard to impute both in the original study and the replication.

$\endgroup$

1 Answer 1

0
$\begingroup$

Genotyping can be validated by whole-genome sequencing, and this has been done on the International Genome Sample Resource. You should be less concerned about the reproducibility of genotyping results, and more concerned about the correctness or validity of genotyping results. Companies will often revisit old SNPchips to mark invalid spots, update rs numbers, and add new genotyping probes that are better on a population-wide scale. In that sense, choosing a SNPchip platform that has an established track record of QC and information about version updates would be a better idea than trying to match a previous study as precisely as possible.

However, with regards to replicating existing studies, a bigger concern for you should be the variation in sampled populations, which plays a big role in the outcome of a polygenic test.

This variation is so extreme that directions of associations can flip for the same variant, especially for low-frequency variants with extremely low p-values (i.e. the ones that often show up as manhattan plot peaks).

The issues around population and study replication are explained well in the following paper from 2001:

https://doi.org/10.1006/tpbi.2001.1543

This is not something that can be fixed by attempting to replicate studies in more populations. As Cecile Janssens says, larger sample size only affects random error, not systematic error.

https://twitter.com/cecilejanssens/status/1119235476567801856

$\endgroup$

Not the answer you're looking for? Browse other questions tagged or ask your own question.