Questions tagged [assembly]
Process of creating the original sequence from the read sequences that it generated during a sequencing experiment. Can refer to genome assembly, in which case the original sequence is a genome, or transcripts assembly, in which case the original sequences are RNA transcripts.
152
questions
35
votes
2
answers
3k
views
Why do some assemblers require an odd-length kmer for the construction of de Bruijn graphs?
Why do some assemblers like SOAPdenovo2 or Velvet require an odd-length k-mer size for the construction of de Bruijn graph, while some other assemblers like ABySS are fine with even-length k-mers?
19
votes
3
answers
699
views
How to deal with heterozygosity during polishing of genome assembly based on long reads?
All the long-read sequencing platforms are based on single-molecule sequencing which causes higher per-base error rates. For this reason a polishing step was added to genome assembly pipelines - ...
18
votes
1
answer
652
views
How can I improve a long-read assembly with a repetitive genome?
I'm currently trying to assembly a genome from a rodent parasite, Nippostrongylus brasiliensis. This genome does have an existing reference genome, but it is highly fragmented. Here are some ...
14
votes
3
answers
530
views
How to make a distinction between the "classical" de Bruijn graph and the one described in NGS papers?
In Computer Science a De Bruijn graph has (1) m^n vertices representing all possible sequences of length n over ...
13
votes
5
answers
358
views
Improve a reference genome with sequencing data
I have a DNA sample which I know doesn't quite match my reference genome - my culture comes from a subpopulation which has undergone significant mutation since the reference was created.
The example I ...
10
votes
2
answers
281
views
Extract sequence context of high-degree nodes in assembly graphs
I often use metaSPAdes to assemble short reads from human microbiomes. My simplified understanding of short-read de Bruijn graph assemblers is that they fail where ambiguous paths cannot be resolved. ...
10
votes
3
answers
723
views
Pooling data in metagenome assembly
I have 12 human gut microbiome WGS Nextseq reads (151 bp paired end). What will be an effective strategy to assemble a metagenome?
Let us say I have already filtered the fastq for quality, adapter ...
9
votes
2
answers
2k
views
Is there a standard definition for "assembly polishing"?
Is there a standard definition for "assembly polishing" in the field?
Is there a standard definition for what polishing algorithms do?
My understanding of "polishing" is strongly influenced by ...
8
votes
2
answers
2k
views
estimate genome size: kmer-based approach from PacBio reads
Can anyone suggest a software/method for kmer analysis using PacBio reads (RSII)?
Something similar to Jellyfish, that I saw in a nice tutorial - but must be suitable for long, noisy reads. ...
7
votes
1
answer
295
views
How to calculate overall reference coverage with MUMmer?
Is the MUMmer suite capable of calculating reference sequence coverage statistics for all query sequences collectively? It would be possible to achieve by parsing the output of ...
7
votes
1
answer
380
views
wtdbg2: practical implications of k-mer fsize and psize choice
I am using wtdbg2 2.3 to assemble a human genome (sequenced on PromethION from a cell line). I filtered out reads with low average quality, and now I am trying to determine the parameters that will ...
6
votes
2
answers
54
views
How to display novel genome assemblies or uncommon genome assemblies using the UCSC Genome Browser?
I want to display E.coli BW25113 (GenBank: CP009273.1) strain in UCSC browser. This strain is not listed in http://microbes.ucsc.edu/ browser. How can I display E.coli BW25113 assembly in the browser?
6
votes
1
answer
165
views
Verify a predicted protein in one genome in a different genome of the same species
I have two genome assemblies of the same non-model species, call them Assembly 1 (generated from Illumina data) and Assembly 2 (generated from PacBio data).
For Assembly 1, I also have predicted ...
6
votes
2
answers
193
views
Genome assembly from error-prone reads
I understand how to assemble genome from error-free reads. I implemented like this:
Construct directed overlap graph with reads as vertices and edges as
maximum overlap between two vertices. ...
6
votes
1
answer
132
views
Is there a way to assemble contigs starting from a specific sequence?
My work involves searching for marker genes/fragments in metagenomic databases (like the Sequence Read Archive). Once I find these sequences, I would like to know more about the neighboring genomic ...