I aligned my ONT sequencing run with minimap2, subsequently I filtered the file using samtools view -b -F 256 aln_transcriptome_sorted_6.bam -o filtered_aln_transcriptome_6.bam to end up with primary alignments only. When I run samtools flagstat on the filtered file I get the following output:
3502608 + 0 in total (QC-passed reads + QC-failed reads)
3186175 + 0 primary
0 + 0 secondary
316433 + 0 supplementary
0 + 0 duplicates
0 + 0 primary duplicates
3502608 + 0 mapped (100.00% : N/A)
3186175 + 0 primary mapped (100.00% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
What is the difference between mapped and primary mapped statistics? And why do I still seem to have non primary alignments in my file after filtering?
I know I'm late to the party, but this might still be useful for future readers :-)
I don't believe this explanation is correct. As far as I understand, a supplementary alignment is not necessarily secondary, and filtering them out is not what you want. There are two types of alignments:
This has nothing to do with primary or secondary. The primary alignment is just the "best" alignment in some sense (best score, MAPQ etc.), whereas secondary alignments are possible alternatives. You can totally have a chimeric alignment as your primary alignment, in the case of complex variants.
If you filter supplementary alignments, you will remove ALL BUT ONE of the lines for a chimeric alignment. Unless you are absolutely certain that your data cannot possibly contain complex variants, and that all chimeric mappings are mismappings, I would advice against removing supplementary alignments.
It is an interesting point that I have not considered. I looked up the SAM specification, it states:
For secondary alignments: "One of these alignments is considered
primary. All the other alignments have thesecondaryalignment flag..."For supplementary alignments: "Typically, one of the linear alignments in a chimeric alignment is considered the
representativealignment, and the others are calledsupplementaryand are distinguished by the supplementary alignment flag"The question is whether the word
representativeis equivalent toprimary. My explanation assumed so and may people may treat it that way.You are making the point that it is not, and I think that makes sense, but if so, now we have a new type of alignment that I never considered before "representative" alignments.
Makes me wonder, are "primary alignments" also a "representative alignments"?
Edit: After further consideration, I believe it is essential to distinguish between counting alignments and selecting for them.
The original question framed from the point of view of the counts for primary and secondary alignments. Suppose you have the stats:
that produces:
To get that number
30000for primary alignments, one would need to run the flags-F 256 -F 2048:prints
But I would agree if someone wanted to select all the primary alignments, then they should not use the
-F 2048Goes to show the SAM/BAM format is more complex than people assume.