Samtools flagstat
1
0
Entering edit mode
3.8 years ago
rakszewska • 0

I aligned my ONT sequencing run with minimap2, subsequently I filtered the file using samtools view -b -F 256 aln_transcriptome_sorted_6.bam -o filtered_aln_transcriptome_6.bam to end up with primary alignments only. When I run samtools flagstat on the filtered file I get the following output:

3502608 + 0 in total (QC-passed reads + QC-failed reads)
3186175 + 0 primary
0 + 0 secondary
316433 + 0 supplementary
0 + 0 duplicates
0 + 0 primary duplicates
3502608 + 0 mapped (100.00% : N/A)
3186175 + 0 primary mapped (100.00% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

What is the difference between mapped and primary mapped statistics? And why do I still seem to have non primary alignments in my file after filtering?

samtools minimap2 • 1.9k views
ADD COMMENT
5
Entering edit mode
3.8 years ago

The sam specification is pretty confusing and awkward to use - making interpretation of the results needlessly complicated (as your example demonstrates).

There is no flag for directly selecting the primary alignments - instead you need to remove the SECONDARY, the SUPPLEMENTARY, and the UNMAPPED alignments to be left with the so-called primary alignments.

It is best written out explicitly like so:

samtools flags -F 256 -F 2048 -F 4
ADD COMMENT
0
Entering edit mode

I know I'm late to the party, but this might still be useful for future readers :-)

I don't believe this explanation is correct. As far as I understand, a supplementary alignment is not necessarily secondary, and filtering them out is not what you want. There are two types of alignments:

  • Linear alignments: all parts of a read align (possily with gaps) in the correct order, on the same chromosome, without changes of direction. Such an alignment can be represented by a single line, and is NOT flagged supplementary.
  • Chimeric alignments: Any alignment that is not linear cannot be represented by a SINGLE line in the SAM/BAM file, so the alignment is split over multiple lines. ONE of these lines is considered representative, ALL OTHERS are considered SUPPLEMENTARY to that one line, indicating that they do not stand for themselves, but are part of a chimeric alignment.

This has nothing to do with primary or secondary. The primary alignment is just the "best" alignment in some sense (best score, MAPQ etc.), whereas secondary alignments are possible alternatives. You can totally have a chimeric alignment as your primary alignment, in the case of complex variants.

If you filter supplementary alignments, you will remove ALL BUT ONE of the lines for a chimeric alignment. Unless you are absolutely certain that your data cannot possibly contain complex variants, and that all chimeric mappings are mismappings, I would advice against removing supplementary alignments.

ADD REPLY
0
Entering edit mode

It is an interesting point that I have not considered. I looked up the SAM specification, it states:

For secondary alignments: "One of these alignments is considered primary. All the other alignments have the secondary alignment flag..."

For supplementary alignments: "Typically, one of the linear alignments in a chimeric alignment is considered the representative alignment, and the others are called supplementary and are distinguished by the supplementary alignment flag"

The question is whether the word representative is equivalent to primary. My explanation assumed so and may people may treat it that way.

You are making the point that it is not, and I think that makes sense, but if so, now we have a new type of alignment that I never considered before "representative" alignments.

Makes me wonder, are "primary alignments" also a "representative alignments"?

Edit: After further consideration, I believe it is essential to distinguish between counting alignments and selecting for them.

The original question framed from the point of view of the counts for primary and secondary alignments. Suppose you have the stats:

samtools flagstat demo.bam

that produces:

31571 + 0 in total (QC-passed reads + QC-failed reads)
30000 + 0 primary
0 + 0 secondary
1571 + 0 supplementary
...

To get that number 30000 for primary alignments, one would need to run the flags -F 256 -F 2048:

samtools view -c -F 256 -F 2048  demo.bam

prints

30000

But I would agree if someone wanted to select all the primary alignments, then they should not use the -F 2048

Goes to show the SAM/BAM format is more complex than people assume.

ADD REPLY

Login before adding your answer.

Traffic: 2650 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6