Questions tagged [reads]
Reads are the sequences output by a sequencing machine after the raw signal (e.g. light, electricity) is converted into bases by a basecaller.
43
questions
6
votes
1
answer
84
views
How many false positive duplicates are marked using just the position of first unclipped base?
In the popular picard MarkDuplicates tool, a read is marked as a duplicate if it has the same position as another read starting from their first unclipped base in ...
1
vote
0
answers
30
views
Determining fragment mean and fragment stdev for MaSuRCA config file
Similar to this unanswered question on Biostars, I am using MaSuRCA for the first time and want to know how other MaSuRCA users are determining fragment mean and fragment stdev. My understanding is ...
2
votes
1
answer
763
views
If fastp output is not a good measure of FASTQ correctness, what is?
In the beginning of my pipeline, I just fed the paired reads (2 files) into fastp, with the default options, and assumed it would do a good job preparing the reads for the next step: alignment
But I ...
1
vote
1
answer
759
views
Comparison of fastq files reads
My goal is to compare reads from two different fastq files on a Linux machine.
The following are the comparisons to perform:
How many common reads are between the two fastq files?
How many reads are ...
2
votes
1
answer
253
views
Filtering paired-end reads with sambamba: avoid discarding reads on the minus strand
I have a BAM file (DNA, shallow whole genome sequencing at ~1X) where I want to filter reads (using sambamba) to keep only those which have a template length > 20 and mapping quality > 20, ...
1
vote
0
answers
294
views
Extract read names and the associated nucleotides on specific positions from a BAM file (in R)
Let's assume I have a BAM file and several positions that I would like to examine more closely in this alignment. My goal is to find out whether these positions are on the same reads and which ...
2
votes
1
answer
53
views
Connection between Detected Genes and The Read Counts
I have been trying to understand the Seurat for analysing scRNA-seq data. It comes to my mind that the main data is organised in the Seurat object with rows as genes and columns as the cells, and the ...
0
votes
1
answer
123
views
Combining read counts from three separate GEO studies
I want to do differential expression analysis with DESEQ2. I have three read counts files downloaded from GEO (small RNAseq based) where the number of miRNAs and id is nearly the same. These studies ...
2
votes
1
answer
1k
views
What is the right way of calculating a Phred score by hand?
i am trying to calculate mean Phred scores for my sequencing data, but i feel not very comfortable about it. There are actually two ways of calculating. (I just use an existing sample)
giving: 3 reads ...
0
votes
1
answer
55
views
Modeling number of reads mapped to a gene
I am looking for a probability distribution of a number of reads mapped to a particular gene in metagenomic sequencing (NGS, shotgun, likely illumina).
Naively one could model it via a binomial (or ...
1
vote
1
answer
1k
views
What is "unmapped read segments" in the output of samtools idxstats?
samtools idxstats produces a four column output (see here)
...
1
vote
1
answer
209
views
How to extract reads with INDELs > a given size?
I'm trying to modify this https://www.biostars.org/p/253774/
To get reads with deletions > 20bp
I think this gives reads with exactly 20bp dels:
...
1
vote
0
answers
48
views
How to control/normalize for number of reads when calling SNPs using RNA-Seq?
I used the GATK pipeline to call SNPs on males and females using RNA-Seq data. But the males have a higher read count (~43-46M reads) than the females (~40-42M reads). This causes SNP counts to be ...
2
votes
1
answer
68
views
If a gene is expressed at a level of 1/1200 compared to the average gene, how is probability 50:50 that we have a read mapped to it?
I am reading a book about RNAseq analysis and it says
"To calculate the probability that a read will map to a specific gene, we can assume an average gene size of 4000 nt (100 M nt divided by 25,...
0
votes
2
answers
381
views
Does rRNA depletion protocol give higher number of mapped reads in Intronic regions?
Recently, I have downloaded a publicly available dataset, which are 350 tumor samples. I see the following information from the published paper.
They used Ribo Zero Gold and rRNA was depleted. Strand ...