I have a txt file summarising the result of a GWAS on an European population. Its structure is the next one:
data = data.frame(chr = c('1', '1', '1', '1', '1'),
bp = c(740098, 787889, 952073, 993492),
snp = c('rs12138618', 'rs4951864', 'rs3128126', 'rs4075116'),
p = c(0.1, 0.04, 0.7, 0.9))
Imagine the first two snp listed (rs12138618, rs4951864) belong to the same LD-block (for r^2 = 0.1 or similar criteria). Where can I get this information from (table/database) or what package in R can I use to label these dependent SNPs and obtain something as:
chr bp snp p ld_block
1 1 740098 rs12138618 0.10 21
2 1 787889 rs4951864 0.04 21
3 1 952073 rs3128126 0.70 29
4 1 993492 rs4075116 0.90 34
I am using big data, so something computationally-wise is preferred.
Note: Obviously I am making up the numbers, but I hope my point is clear. I am also new to GWAS, so it is very likely that I can be missing/misunderstanding concepts.
What if I do not have the genotype data but I know that my population is CEU? Is there any reference-genotype data that I can provide to plink instead?