All Questions
31
questions
1
vote
0
answers
149
views
bwa mem hangs after a few thousand reads
I am trying to align a bunch of paired sample fastq files using bwa mem.
My original command was:
...
1
vote
1
answer
29
views
How important are the homozygous variants that get unnecessarily deleted using liftover?
I'm referring to the text described here:
These tools [NCBI remap, CrossMap] operate only on the sites present in an input VCF, and return the representation of those sites in a new genome assembly. ...
2
votes
2
answers
233
views
Improving prokaryotic assembly with other contig/scaffold-level data?
I have what at first sight appears to be a high-quality MAG (~10 pieces, high completion%) that I built from a hybrid assembly (Illumina + Nanopore data) from a cyanobacterium.
Workflow:
Quality ...
4
votes
1
answer
66
views
How to promote assemblies into genomes in NCBI?
Note: I've never submitted an assembly/genome to NCBI, so excuse if my perspective is flawed.
I'm working with Drosophila subobscura. (spring fruit fly)
I see here https://www.ncbi.nlm.nih.gov/data-...
2
votes
1
answer
82
views
What is the best way to process yeast genomes?
I have obtained several hundred raw, unassembled yeast genomes from NCBI and I am looking for advice on how to process the genomes for downstream analysis.
I have a reference genome (S288C) to use for ...
0
votes
0
answers
222
views
How to manually curate a genome assembly for sequence variation or error?
I have a PacBio HiFi assembly of 1.1 Gb from a heterozygous species. I have aligned this assembly against a reference genome which is around 0.9 Gb. I can see that there are quite a few INDELs, ...
1
vote
1
answer
212
views
Comparing homozygosity of k-mer plots
Attached are two kmer plots from two closely related species. Is that safe to say that the one on the left has higher homozygosity than the one in the right k-mer plot, due to a low to almost flat ...
0
votes
1
answer
470
views
Understand this Kmer plot from Merqury?
The attached figure is generated based on Illumina reads from multiple individuals compared to genome assembly. Looks like there are a lot of kmers are reads only (grey colored). Also, a blue peak (2x ...
0
votes
2
answers
332
views
How to know if the DNA sequence has been assembled and why is it important to know how it was assembled?
I have downloaded my FASTA format files, that have the DNA sequences of the coding region of the genes and the DNA sequence of the complete genome, from NCBI. How can I recognize if these sequences ...
1
vote
1
answer
98
views
How to quantifiy of specific genes from shotgun metagenome?
I have googled a "lot", couldn't find any specific answer to the question. So, I am here seeking for your guidance. My question is similar to this. I have several metagenome (n=30). But for ...
0
votes
3
answers
77
views
Genome QC + Assembly Pipeline semantics
I’m trying to create a pipeline for genome assembly. How best can I “redirect/pipe” from existing fasta files (or files in general) to other steps of the pipeline?
I was thinking of going from the SRA ...
3
votes
2
answers
89
views
Gold standard benchmark
This page is claimed to contain a gold standard benchmark for viral genome assembly.
https://github.com/cbg-ethz/5-virus-mix
The claim is here:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5411778/
...
0
votes
0
answers
248
views
How to filter a genome assembly consistsing of a large number of contigs?
I did some de novo genome assemblies with Illumina PE data using SPAdes, whereas most of them consisting of a large number of contigs(>1000). I have several questions below.
Do we need to filter ...
1
vote
2
answers
794
views
Is it possible to filter contaminated reads for raw PacBio sequences (not HiFi reads) before assembly?
De novo genome assembly for non-model organisms face the issue of bacterial contamination. For assembled contigs with mostly bacterial-like sequences (based on BLAST search), the entire contig can be ...
0
votes
1
answer
271
views
Is it possible to convert BAM file from one genome assembly to the other?
I Have multiple BAM files that are referenced to UCSC genome assembly GRCh37/hg19 that are read in different time frames. Now, I am planning a different studies that require assembling all the data ...