Why does my HWE QQ plot have extreme deviation and what does it mean?

Question

This question was also asked on Reddit

I have recently completed my thesis and one of the comments was that I report on why this graph looks this way. I have tried to find a reason but the closest I can come to is that it means that there are false positives? Or that this is possibly due to population stratification?

The trait of interest was alcohol dependency with covariates of sex and age, in a non african american population.

I am also unsure as to how this will impact my research or the implications of this?

I don't understand why this population (non-African American) "reacted" this way - do I provide a different null model? The previous populations had "relatively normal" plots. They were all run through the same pipeline.

How the data was calculated:

There were 1061 individuals in the non-African American dataset. At QC phase 0, there were 2 individuals who had discordant sex information and 87 duplicate SNPs. When the MAF value was set to 0.000015, 125708 SNPs were removed when the MAF. Both the mind and geno parameters were set to 0.2, resulting in 127357 and 0 SNPs to be removed respectively. When setting the HWE to $1 \times 10^{-12}$, 4495 SNPs were removed. After all these parameters were set and the pipeline completed, there were 851773 SNPs left from 1044 individuals.

gringer · Accepted Answer · 2023-07-10 22:26:53Z

This much deviation from the expected distribution looks like a modeling issue, i.e. that the statistical model use to represent the null hypothesis is incorrect.

I can't see any values that lie along the expected line. It's more common to see a generally linear trend, with a few outliers just at the end.

It could very likely be due to population stratification. You have defined this population as "non-African American", which seems to me to be quite a vague definition that would capture a lot of different population groups with distinct genetic histories.

In my own research many years ago, I found substantially different haplotypes within the ADH gene cluster between Māori and European individuals, and I could imagine that not adjusting for population structure in other populations would have an impact on association scores.

M__ · Accepted Answer · 2023-07-10 16:02:57Z

The level of discordance is extreme, particularly for humans. This is either a publication in Nature/Science and massive advance in the field (especially humans), or the genetic factors have massive confounders.

In theory, I'm aware there must be a genetic component to alcoholism, there is a biochemical component about the rate of processing alcohol toxins. However, non-genetic (especially social/community) affects of alcoholism could predominate.

What I personally think is happening is there are very strong non-genetic confounders which are disrupting the model and this is getting confused with neutral HWE. For example, if alcoholism predominates in a community or social group the transmission is social not genetic, so the confounder is extreme because they are genetically related, but the gene(s) are not being inherited. In phylogenetic terms it will make the "gene" appear far higher up (far deeper) in the tree than it actually is. The "lone" drinker is probably the model, but group behaviour and peer-pressure is probably more realistic. Equally community effects could be suppressing alcoholism, such as religious belief.

Thus if a non-genetic model of alcoholism, (loads of social factors) resulted in a stronger fit by for example 3 logs difference, whilst the genetic model is 50 logs difference, then that is the answer and points to the need for careful population sampling for GWAS. Upvotes J Human Gen is cool.

As a disclaimer, I do not understand not human genetics outside immunology. I do understand multiple linear regression models and failing to account for a pivotal factor(s) causes unusual effects. An obvious example in this case is religious belief: thus alcoholism genes could be present but the environment is not presented. For example, you might conclude long hair caused a reduction in human growth because they tightly correlated in humans. However, once gender is a factored into the model, the result is clear hair-length associates with gender not height.

Stack Exchange Network

Why does my HWE QQ plot have extreme deviation and what does it mean?

2 Answers 2

Not the answer you're looking for? Browse other questions tagged
genome
snp
gwas
or ask your own question.

Hot Network Questions

Why does my HWE QQ plot have extreme deviation and what does it mean?

2 Answers 2

Not the answer you're looking for? Browse other questions tagged genomesnpgwas or ask your own question.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
genome
snp
gwas
or ask your own question.