Skip to main content

Questions tagged [assembly]

Process of creating the original sequence from the read sequences that it generated during a sequencing experiment. Can refer to genome assembly, in which case the original sequence is a genome, or transcripts assembly, in which case the original sequences are RNA transcripts.

1 vote
1 answer
24 views

Why is RNASpades giving three FASTA output files instead of only one?

I'm running RNASpades for de-novo transcriptome assembly in the Galaxy workflow manager . Instead of giving only one output of ...
Eshaan IITM's user avatar
1 vote
0 answers
28 views

Assessing the quality of an assembly

I am trying to run a script that assess the quality of a transcriptomic assembly, a de novo assembly using a tool called Transrate. To install the tool I followed the prompts in https://bioconda....
thole's user avatar
  • 163
1 vote
2 answers
66 views

Where to find the homopolymer regions bed file for Hg002 genome?

This question was also asked on Biostars I am doing an experiment where I am trying to analyze the errors in the homopolymer regions between the polished reference hg002 genome and hifiasm assembly ...
Panda_1996's user avatar
2 votes
0 answers
49 views

I'm trying to run aTRAM tool for assembly, but i'm stuck on this error:

Code: ...
Ro S's user avatar
  • 21
1 vote
1 answer
40 views

Hybrid assembly versus polishing for hifi and illumina reads

I will have to carry out a project of assembly using hifi reads for which I have already illumina reads and I am wondering which of the hybrid assembly or polishing would be the best option for this ...
R-addict's user avatar
5 votes
2 answers
95 views

Multiple genome assemblies of the same bacterial species

I have some RNA-seq data where there are reads from the "host" as well as from several bacteria species. In this experimental context, I am interested in the host associated reads and the ...
haci's user avatar
  • 4,192
2 votes
2 answers
68 views

Longstitch error make: command: Command not found *** No rule to make target

I installed Longstitch and ran the test script with no issues. The output files matched the expected output files. But when I am now trying to run Longstitch on my own data I am getting this error. <...
Karli's user avatar
  • 21
1 vote
1 answer
63 views

MMSeqs taxonomy running for over a day

I've been trying to run mmseqs2 on a few metagenomic assemblies and despite my best efforts in reading the wiki and playing with parameters, the process is taking over a day. In their paper they claim ...
Rainman's user avatar
  • 171
1 vote
0 answers
149 views

bwa mem hangs after a few thousand reads

I am trying to align a bunch of paired sample fastq files using bwa mem. My original command was: ...
padakpatek's user avatar
1 vote
0 answers
30 views

Determining fragment mean and fragment stdev for MaSuRCA config file

Similar to this unanswered question on Biostars, I am using MaSuRCA for the first time and want to know how other MaSuRCA users are determining fragment mean and fragment stdev. My understanding is ...
juliadouglasf's user avatar
0 votes
1 answer
116 views

Why do we delete scaffolds shorter than 500 - 1000 bp from the assembled genome?

After assembly of genome, some protocols sometimes call for removal of scaffolds shorter than 500 or 1000 (some papers have one number while the other has the other.) Is this simply to remove the ...
Aurel's user avatar
  • 1
1 vote
1 answer
251 views

Interpreting GFA graph visualized in Bandage

I assembled Nanopore sequenced reads with Flye and visualized the GFA graph in Bandage but I don't really know how to interpret the result. For context, this is a yeast (DNA) genome. My goal is to ...
rimo's user avatar
  • 1,033
1 vote
1 answer
29 views

How important are the homozygous variants that get unnecessarily deleted using liftover?

I'm referring to the text described here: These tools [NCBI remap, CrossMap] operate only on the sites present in an input VCF, and return the representation of those sites in a new genome assembly. ...
BigMistake's user avatar
3 votes
1 answer
69 views

compare fasta sequences in pairs and collect metrics

I have 96 fasta files (A1, A2, A3...) from one plasmid assembly pipeline, and I have another 96 fasta files (B1, B2, B3 ...) from another plasmid assembly pipeline. I would like to compare pair ...
cautree's user avatar
  • 139
1 vote
0 answers
51 views

Why do XRAY (but not CryoEM) structures of ribosome in PDB have 2 assemblies?

When i started programming against PDB i had a mixture of confusion & frustration with the fact that certain cif files contain two actual structures aka ...
rtviii's user avatar
  • 364
1 vote
0 answers
15 views

DNASTAR viral-host integration assembly keeps failing

I have two NGS files from an NGS company corresponding to the sequencing data from a tumor sample as follows: TB_7710391_R1.FASTQ.gz TB_7710391_R2.FASTQ.gz I have downloaded the genome for MCPyV as ...
InterestingQuestions61's user avatar
1 vote
0 answers
97 views

RagTag patch error--"Tuple index out of range"

This question was also asked on GitHub I'm trying to correct a long-read assembly with a short-read scaffold; I'm hoping to fill in the short gaps in the scaffold with the matching long-read sections. ...
schmiggle's user avatar
2 votes
1 answer
78 views

Velvet Optimizer automatically changes to hash-length 31

I'm trying to use Velvet Optimizer for a De Novo Assembly; I set my hash-lengths to be between 55 and 69 ...
pvp's user avatar
  • 67
2 votes
1 answer
40 views

How to find specific types of assemblies for specific species using entrez tools?

How to find specific types of assemblies for specific species using entrez tools? Task: Trying to specifically find transcriptomes and associated cDNA data for a list of speices. I can use this ...
Sudoh's user avatar
  • 217
1 vote
1 answer
61 views

How to subset an SRA file for a single chromosome?

I used prefetch to get the Pacbio reads of chicken from the SRA database. I want to align these reads against a reference genome, but not all the reads. I am only interested in a particular region on ...
venkatesh war's user avatar
2 votes
2 answers
233 views

Improving prokaryotic assembly with other contig/scaffold-level data?

I have what at first sight appears to be a high-quality MAG (~10 pieces, high completion%) that I built from a hybrid assembly (Illumina + Nanopore data) from a cyanobacterium. Workflow: Quality ...
Laura's user avatar
  • 1,007
2 votes
1 answer
78 views

Sanger sequencing annotation error

I am a student in a Cancer lab. Working with sanger is new to me. While analyzing a report we found an insertion that has not been reported in any databases so far, we were working on checking if the ...
user avatar
4 votes
1 answer
1k views

How to solve Nextflow error: "Trace file already exists"?

When trying to run epi2me-labs/wf-artic, I get the following error: ...
Cornelius Roemer's user avatar
2 votes
2 answers
69 views

Calling isoforms from long read data generated from partially degraded RNA

What will be the best tool to call isoforms from long read data generated from partially degraded RNA. By mistake we processed some samples with poor quality RNA to generate long read. Now we are ...
user3377241's user avatar
2 votes
1 answer
38 views

Using very closely related strains to increase coverage for short read, de novo assembly

Do you think it's possible to combine short read Illumina libraries (WGS) from multiple closely related eukaryotic microbial strains (e.g. libraries from a re-sequencing study, >99% ITS1 sequence) ...
bishopia's user avatar
1 vote
1 answer
40 views

How is an X chromosome encoded into a fasta string?

A human "X" chromosome has a centromere and two "identical" chromatids. If the chromatids are not identical, this fact is not assembled, correct? The fasta string for a chromomsome ...
gl00ten's user avatar
  • 249
4 votes
1 answer
66 views

How to promote assemblies into genomes in NCBI?

Note: I've never submitted an assembly/genome to NCBI, so excuse if my perspective is flawed. I'm working with Drosophila subobscura. (spring fruit fly) I see here https://www.ncbi.nlm.nih.gov/data-...
gl00ten's user avatar
  • 249
1 vote
3 answers
98 views

How can I assemble my genome from raw files?

I've had my whole genome sequenced (at 30x average coverage) by a lab, and they have provided the raw files to me (BAM, FASTQ, and VCF). How can I assemble it? And does assembly provide any further ...
Gimme the 411's user avatar
2 votes
1 answer
126 views

need bam file for pilon

I just ran an assembly on yeast genomes using Flye and I want to polish those assemblies with Pilon but it requires a sorted BAM file. How do I make a BAM file of the resulting assembled.fasta?
rimo's user avatar
  • 1,033
2 votes
1 answer
82 views

What is the best way to process yeast genomes?

I have obtained several hundred raw, unassembled yeast genomes from NCBI and I am looking for advice on how to process the genomes for downstream analysis. I have a reference genome (S288C) to use for ...
rimo's user avatar
  • 1,033
4 votes
1 answer
115 views

How does one distinguish nuclear DNA from mitochondrial DNA when doing WGS?

I'm interested in doing de-novo sequencing but also phylogenetic analysis. In particular, after de-novo sequencing and annotating the genome, I need to align the CO1 gene and the nuclear 28S rRNA gene ...
Caterina's user avatar
  • 307
1 vote
1 answer
103 views

MetaQuast for assembling samples from complex communities

I'm working with whole genome metagenomic samples from human skin, and I'm using MEGAHIT for assembly and MetaQuast for evaluation. However, MetaQuast requires a list of reference genomes for the ...
Poccia's user avatar
  • 13
1 vote
0 answers
562 views

DNA genome string reconstruction from k-mer

I have the following quiz question, but the Pattern1 for both (ACC|ATA) and (CGA|ACT) are unique (just grep for ...
kevin's user avatar
  • 141
2 votes
1 answer
45 views

How good does the assembly of an NCBI prokaryotic genome have to be in order to argue gene loss?

NCBI has several labels for assembly completeness - Complete, Scaffold, Chromosome and Contig. Complete would be a circularized genome (or linear, rarely) For a Complete genome it's fairly ...
Laura's user avatar
  • 1,007
0 votes
2 answers
32 views

Assemble Anaeroplasma species genome from metagenomic PacBio data

I have a fasta file containing reads generated by PacBio HiFi whole genome sequencing of a feces sample from mouse. I would like to use this dataset to generate an assembled circularized genome for an ...
Amroon's user avatar
  • 1
2 votes
2 answers
79 views

How can I improve or otherwise investigate an unreliable genome tree?

Summary My genome tree doesn't agree with my gene trees and I get the feeling that my genome tree might be wrong, possibly due to long branch attraction, but I don't know how to check/fix it. ...
Laura's user avatar
  • 1,007
0 votes
1 answer
218 views

Why must a maximal non-branching path be a contig?

The following is from Bioinformatics Algorithms: Fortunately, we can derive contigs from the de Bruijn graph. A path in a graph is called non-branching if in(v) = out(v) = 1 for each intermediate ...
Moo's user avatar
  • 127
6 votes
2 answers
54 views

How to display novel genome assemblies or uncommon genome assemblies using the UCSC Genome Browser?

I want to display E.coli BW25113 (GenBank: CP009273.1) strain in UCSC browser. This strain is not listed in http://microbes.ucsc.edu/ browser. How can I display E.coli BW25113 assembly in the browser?
Supertech's user avatar
  • 616
10 votes
2 answers
281 views

Extract sequence context of high-degree nodes in assembly graphs

I often use metaSPAdes to assemble short reads from human microbiomes. My simplified understanding of short-read de Bruijn graph assemblers is that they fail where ambiguous paths cannot be resolved. ...
acvill's user avatar
  • 613
0 votes
0 answers
222 views

How to manually curate a genome assembly for sequence variation or error?

I have a PacBio HiFi assembly of 1.1 Gb from a heterozygous species. I have aligned this assembly against a reference genome which is around 0.9 Gb. I can see that there are quite a few INDELs, ...
Anik Dutta's user avatar
0 votes
1 answer
74 views

How to improve a genome assembly using Dovetail and PacBio assembly?

I have more of a conceptual question. I have two genome assemblies from the same plant, one from Dovetail technology (~998 Gb) and another is PacBio HiFi assembly (~1.1 Gb). The Dovetail assembly is ...
Anik Dutta's user avatar
0 votes
1 answer
80 views

Length of Contigs in Transcriptome and Whole Genome Assembly

Why are there shorter contigs from transcriptome assembly than from a whole genome assembly? I know the difference between transcriptome and genome, but don't really understand what contigs are in the ...
b14108's user avatar
  • 13
1 vote
1 answer
212 views

Comparing homozygosity of k-mer plots

Attached are two kmer plots from two closely related species. Is that safe to say that the one on the left has higher homozygosity than the one in the right k-mer plot, due to a low to almost flat ...
Life_Searching_Steps's user avatar
0 votes
1 answer
470 views

Understand this Kmer plot from Merqury?

The attached figure is generated based on Illumina reads from multiple individuals compared to genome assembly. Looks like there are a lot of kmers are reads only (grey colored). Also, a blue peak (2x ...
Life_Searching_Steps's user avatar
0 votes
2 answers
107 views

What are some ways to check metagenomic bin quality?

I am new to metagenomic binning. I've used CheckM in order to estimate completeness and contamination values, and most of my bins of interest appear to have good values. My workflow was pretty ...
Laura's user avatar
  • 1,007
0 votes
1 answer
130 views

BUSCO results apparently inconsistent

I have two assemblies obtained from two different softwares for the same run data. In one of them, I get a BUSCO score of 69,4%. In the other one, I get 68,5%. The first assembly covers a 99,7% ...
juanjo75es's user avatar
0 votes
1 answer
591 views

How to de novo hybrid assemble with Pacbio CCS and Illumina PE reads

I would like to perform de novo genome assembly on a diploid microalgal strain. I have two datasets: PacBio CCS/HiFi reads, low coverage. Illumina PE 2x150 (standard shotgun) Does anybody have any ...
bishopia's user avatar
1 vote
1 answer
70 views

Assembling all transcripts for an individual gene? (using single sequence to seed the assembly)

Let's say I have a candidate gene and I believe that in an individual sample, the genome sequence differs from the reference which then interferes with alignment. Is there a way for me to do a "...
story's user avatar
  • 1,603
0 votes
2 answers
202 views

How do you set the coverage in PacBio's Sequel II?

I am reading the Whole Genome Sequencing for de novo Assembly Best Practices Use the Sequel II or IIe System and SMRT® Cell 8M to sequence to desired coverage depth for complexity of genome 10- to 15-...
ilam engl's user avatar
  • 280
5 votes
1 answer
89 views

Can someone help me estimating the runtime of the pipeline applied by the vertebrate genome project?

The vertebrate genome project (VGP) has a lot of interesting publications such as this one. The rough pipeline is outlined below: Here the pipeline in more detail: While the paper describes all the ...
ilam engl's user avatar
  • 280

15 30 50 per page