2
$\begingroup$

I have data from a GWAS that provides me with the SNP, Chromosome, and base-pair. My data set has thousands of SNPs. What is the easiest way to find the nearest gene for each SNP using this information? My data looks like this:

enter image description here

$\endgroup$
3
  • $\begingroup$ What language are you using? There are some approaches listed here: biostars.org/p/111225 , do any of those answers suit? $\endgroup$ Commented Feb 23, 2023 at 0:11
  • $\begingroup$ Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. $\endgroup$
    – Community Bot
    Commented Feb 23, 2023 at 0:13
  • $\begingroup$ Do you want the nearest gene or the gene(s) that overlap the position? What if the nearest gene is a few megabases away, would you still want that? $\endgroup$
    – terdon
    Commented Feb 23, 2023 at 14:39

1 Answer 1

2
$\begingroup$

This answer assumes that you have a terminal that runs linux commands and python, and that you can install and run bedtools.

step 1: convert your input spreadsheet to BED format. There is probably an awk one liner that does this but something like this python script will do it based on csv input (assuming that your coordinates are 1-indexed):

# writes BED format to standard output, capture it in input_file.bed
# not actually run
with open("input_file.csv") as infile:
  for line in infile:
    fields = line.split(",")
    coord = int(fields[2])
    outline = f"chr{fields[0]}\t{coord - 1}\t{coord}\t{fields[1]}"
    print(outline)

step 2: sort the BED.

sort -k1,1 -k2,2n input_file.bed > input_file.sorted.bed

step 3: Get genome annotations. I am assuming that you are using the grch38 build of the human genome.

wget https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_43/gencode.v43.annotation.gff3.gz

step 4: Get nearest gene for each snp.

bedtools closest -a input_file.sorted.bed -b gencode.v43.annotation.gff3.gz > input_annotated_with_genes.bed
$\endgroup$

Not the answer you're looking for? Browse other questions tagged or ask your own question.