Questions tagged [1000genomes]
The 1000genomes tag has no usage guidance.
19
questions
4
votes
1
answer
102
views
tabix errors when accessing 1000 Genomes data: "[E::bgzf_read] bgzf_read_block error -1 after 50219 of 52392 bytes" and Could not load .tbi/.csi index
I am trying to access 1000 Genomes (1KG) data using tabix as per the 1KG tabix documentation "How do I get a genomic region sub-section of your files?" ...
1
vote
1
answer
278
views
LD clump GRCh38 GWAS results
The vignette of R package ieugwasr describes a plink based wrapper function for LD clumping GWAS data using the 1000 genomes ...
3
votes
2
answers
95
views
Should genotype imputation be ancestry specific?
I'm wondering if imputation, specifically Beagle, needs a reference panel that matches the sample's ancestry group. For example, Beagle documentation suggests the 1000 Genomes Project phase 3 ...
3
votes
2
answers
344
views
Where do I get a large reference VCF?
I would like to download a large .vcf file containing many (hundreds or thousands) of samples. Ideally, I would download different population-specific .vcf files, but the ability to sort/filter by ...
2
votes
3
answers
113
views
Interpreting short indel calls in 1000 Genomes Project VCFs
Consider the following short indel polymorphism rs59679400 on chr7.
...
1
vote
1
answer
244
views
Where can I download 30x 1000 genomes cram files?
From the preprint published by 1000 genome project (https://www.biorxiv.org/content/10.1101/2021.02.06.430068v1.full) I think the 30x data is for WGS. Can anyone confirm for me if the following file ...
2
votes
0
answers
104
views
SNPs with high population differentiation from 1k Genome dataset
I am trying to reproduce the results from this paper "Human genomic regions with exceptionally high levels of population differentiation identified from 911 whole-genome sequences".
...
3
votes
2
answers
64
views
Chosing an imputation panel for SNP-Chip data?
I have about 1,000 SNP-Chip data (samples) that I'd like to impute over (for the purpose of having more rsIDs to match against GWAS data).
However, I don't know the ancestry of each sample / the ...
1
vote
1
answer
361
views
Interpreting imputation result from GLIMPSE
I'm following this tutorial of GLIMPSE for learning. I was expecting some extra SNPS coming from the 1000 genome reference in the resulting .vcf file. Though I understand the phasing in the output ...
1
vote
3
answers
976
views
Masking sites in a vcf file
I need to mask all sites in a vcf file flagged by the 1000 Genomes Project as being unfit for population genetic analyses. The sites for all chromosomes are available at:
1000Genomes masked sites
From ...
3
votes
1
answer
68
views
dbnSNP frequency anomalies
Sometimes dbSNP reports very different allele frequencies for different large-scale genome projects e.g. between 1000 Genomes and GnomAD
rs11822440 1000Genomes A=0.4629 C=0.5371 GnomAD A=0.99997 ...
4
votes
1
answer
824
views
Difference between genome assembly and genome sequence alignment to a reference to find structural variants
I'm trying to determine what the difference and benefits of genome assembly and genome sequence alignments are when trying to identify structural variants or transposons in populations.
I've been ...
1
vote
0
answers
19
views
Difference between "trans-ethnic" and "cross-ancestry"
A quick terminology question today. I see the terms "cross-ancestry" and "trans-ethnic" used seemingly interchangeably in literature. Is there any real difference between those two ...
-2
votes
2
answers
334
views
Removing common variants in the 1000 genomes database from .vcf [closed]
I have 15 .vcf files. I need to remove `common variants in the 1000 genomes database' appearing in at least 0.5% of the population
Do you know from where I may start?
Thank you so much
0
votes
1
answer
90
views
Obtaining HGDP project data in fasta format
I need to obtain sample data from modern humans in fasta format. I just need some megabytes of data from every individual. I actually use a script that obtains the cram file from here (ftp.1000genomes....