Variable numbers of Reads. I'm lost
1
0
Entering edit mode
15 months ago
z.ziriat • 0

Hello,

I have a question regarding the read count. After the sequencing, i found that i have 513k reads that were passed. But when i align (minimap2) and check how much reads i have in my bam file using Samtools, i found myself with 1.7M reads. After that i was looking for the mapped reads only and i have 280k. Than when i use featureCount to quantify and count my reads, it tells me that i have 1.7M reads, but assign only 268k reads. Can you help me please ?

Another thing : I was also wondering how to quantify rRNAs in my sequencing using featureCount. I tried to retrieve all rRNAs feature in the count file generated by featureCount, but the number of reads aligning to the rRNAs is greater than the reads assigned by featureCount.

Thank you very much for your help!!

Nanopore featureCount Alignment Sequencing Reads • 798 views
1
Entering edit mode

You likely have secondary/supplementary alignments since you have long read data. So that is likely reason for the number discrepancy.

Can you clarify what are you mapping against? Whole genome?

0
Entering edit mode

Yes the whole genome

1
Entering edit mode

You are sure your genome has the rDNA repeat in it? This is one of the part of genome that is not completely placed since individuals have between 300-500 copies on different chromosomes.

I was also wondering how to quantify rRNAs in my sequencing

While it is not recommended to use a reduced reference, if you want to identify reads that map to human rDNA repeat you could use this sequence: how can i download human ribosomal reference ?

1
Entering edit mode

Any reason you are not using pipelines like this one from ONT?

1
Entering edit mode
15 months ago

First, note that the number of reads is not the same concept as the number of alignments. One read may produce several alignments. The BAM file contains alignments not reads.

Run the:

samtools flagstat


command. It will describe your data in a report like:

31571 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
1571 + 0 supplementary
0 + 0 duplicates
31345 + 0 mapped (99.28% : N/A)
30000 + 0 paired in sequencing
29616 + 0 properly paired (98.72% : N/A)
29762 + 0 with itself and mate mapped
12 + 0 singletons (0.04% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)


it will show you how your alignments are distributed.

0
Entering edit mode

Can you help me understand what is the difference between secondary aligned and the rest ? Also if i have x number of reads in my fastq, how can i have more that that in the primary aligned ?

0
Entering edit mode

Unfortunately, there is no precise definition of what the terms primary/secondary/supplementary alignment ought to mean. When you consult the Sam Specification:

https://samtools.github.io/hts-specs/SAMv1.pdf

it states that:

Multiple mapping The correct placement of a read may be ambiguous, e.g., due to repeats. In this case, there may be multiple read alignments for the same read. One of these alignments is considered primary. All the other alignments have the secondary alignment flag set in the SAM records that represent them. All the SAM records have the same QNAME and the same values for 0x40 and 0x80 flags. Typically the alignment designated primary is the best alignment, but the decision may be arbitrary.

You are asking on how can you have more primary alignments than reads. Well, frankly, I suspect you are counting it quite right, but it is not really your fault. The spec is confusing and obtuse, and that makes it not so easy to count it correctly. Paste the samtools report in a reply to this comment and we'll take a look.